[jira] [Updated] (HADOOP-19120) [ABFS]: ApacheHttpClient adaptation as network library

2024-06-20 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena updated HADOOP-19120:
---
Status: Patch Available  (was: Open)

https://github.com/apache/hadoop/pull/6633

> [ABFS]: ApacheHttpClient adaptation as network library
> --
>
> Key: HADOOP-19120
> URL: https://issues.apache.org/jira/browse/HADOOP-19120
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.5.0
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> Apache HttpClient is more feature-rich and flexible and gives application 
> more granular control over networking parameter.
> ABFS currently relies on the JDK-net library. This library is managed by 
> OpenJDK and has no performance problem. However, it limits the application's 
> control over networking, and there are very few APIs and hooks exposed that 
> the application can use to get metrics, choose which and when a connection 
> should be reused. ApacheHttpClient will give important hooks to fetch 
> important metrics and control networking parameters.
> A custom implementation of connection-pool is used. The implementation is 
> adapted from the JDK8 connection pooling. Reasons for doing it:
> 1. PoolingHttpClientConnectionManager heuristic caches all the reusable 
> connections it has created. JDK's implementation only caches limited number 
> of connections. The limit is given by JVM system property 
> "http.maxConnections". If there is no system-property, it defaults to 5. 
> Connection-establishment latency increased with all the connections were 
> cached. Hence, adapting the pooling heuristic of JDK netlib,
> 2. In PoolingHttpClientConnectionManager, it expects the application to 
> provide `setMaxPerRoute` and `setMaxTotal`, which the implementation uses as 
> the total number of connections it can create. For application using ABFS, it 
> is not feasible to provide a value in the initialisation of the 
> connectionManager. JDK's implementation has no cap on the number of 
> connections it can have opened on a moment. Hence, adapting the pooling 
> heuristic of JDK netlib,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-19120) [ABFS]: ApacheHttpClient adaptation as network library

2024-06-20 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena updated HADOOP-19120:
---
Fix Version/s: 3.5.0
   3.4.1

> [ABFS]: ApacheHttpClient adaptation as network library
> --
>
> Key: HADOOP-19120
> URL: https://issues.apache.org/jira/browse/HADOOP-19120
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.5.0
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> Apache HttpClient is more feature-rich and flexible and gives application 
> more granular control over networking parameter.
> ABFS currently relies on the JDK-net library. This library is managed by 
> OpenJDK and has no performance problem. However, it limits the application's 
> control over networking, and there are very few APIs and hooks exposed that 
> the application can use to get metrics, choose which and when a connection 
> should be reused. ApacheHttpClient will give important hooks to fetch 
> important metrics and control networking parameters.
> A custom implementation of connection-pool is used. The implementation is 
> adapted from the JDK8 connection pooling. Reasons for doing it:
> 1. PoolingHttpClientConnectionManager heuristic caches all the reusable 
> connections it has created. JDK's implementation only caches limited number 
> of connections. The limit is given by JVM system property 
> "http.maxConnections". If there is no system-property, it defaults to 5. 
> Connection-establishment latency increased with all the connections were 
> cached. Hence, adapting the pooling heuristic of JDK netlib,
> 2. In PoolingHttpClientConnectionManager, it expects the application to 
> provide `setMaxPerRoute` and `setMaxTotal`, which the implementation uses as 
> the total number of connections it can create. For application using ABFS, it 
> is not feasible to provide a value in the initialisation of the 
> connectionManager. JDK's implementation has no cap on the number of 
> connections it can have opened on a moment. Hence, adapting the pooling 
> heuristic of JDK netlib,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-19137) [ABFS]:Extra getAcl call while calling the very first API of FileSystem

2024-04-25 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena updated HADOOP-19137:
---
Description: 
Store doesn't flow in the namespace information to the client. 

In https://github.com/apache/hadoop/pull/6221, getIsNamespaceEnabled is added 
in client methods which checks if namespace information is there or not, and if 
not there, it will make getAcl call and set the field. Once the field is set, 
it would be used in future getIsNamespaceEnabled method calls for a given 
AbfsClient.

Since, CPK both global and encryptionContext are only for hns account, the fix 
that is proposed is that we would fail fs init if its non-hns account and cpk 
config is given.

  was:
Store doesn't flow in the namespace information to the client. 

In https://github.com/apache/hadoop/pull/3440, getIsNamespaceEnabled is added 
in client methods which checks if namespace information is there or not, and if 
not there, it will make getAcl call and set the field. Once the field is set, 
it would be used in future getIsNamespaceEnabled method calls for a given 
AbfsClient.

Since, CPK both global and encryptionContext are only for hns account, the fix 
that is proposed is that we would fail fs init if its non-hns account and cpk 
config is given.


> [ABFS]:Extra getAcl call while calling the very first API of FileSystem
> ---
>
> Key: HADOOP-19137
> URL: https://issues.apache.org/jira/browse/HADOOP-19137
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
>
> Store doesn't flow in the namespace information to the client. 
> In https://github.com/apache/hadoop/pull/6221, getIsNamespaceEnabled is added 
> in client methods which checks if namespace information is there or not, and 
> if not there, it will make getAcl call and set the field. Once the field is 
> set, it would be used in future getIsNamespaceEnabled method calls for a 
> given AbfsClient.
> Since, CPK both global and encryptionContext are only for hns account, the 
> fix that is proposed is that we would fail fs init if its non-hns account and 
> cpk config is given.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-19137) [ABFS]:Extra getAcl call while calling the very first API of FileSystem

2024-04-19 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena updated HADOOP-19137:
---
Description: 
Store doesn't flow in the namespace information to the client. 

In https://github.com/apache/hadoop/pull/3440, getIsNamespaceEnabled is added 
in client methods which checks if namespace information is there or not, and if 
not there, it will make getAcl call and set the field. Once the field is set, 
it would be used in future getIsNamespaceEnabled method calls for a given 
AbfsClient.

Since, CPK both global and encryptionContext are for hns account, the fix that 
is proposed is that we would fail fs init if its non-hns account and cpk config 
is given.

  was:
Store doesn't flow in the namespace information to the client. 

In https://github.com/apache/hadoop/pull/3440, getIsNamespaceEnabled is added 
in client methods which checks if namespace information is there or not, and if 
not there, it will make getAcl call and set the field. Once the field is set, 
it would be used in future getIsNamespaceEnabled method calls for a given 
AbfsClient.


> [ABFS]:Extra getAcl call while calling the very first API of FileSystem
> ---
>
> Key: HADOOP-19137
> URL: https://issues.apache.org/jira/browse/HADOOP-19137
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>
> Store doesn't flow in the namespace information to the client. 
> In https://github.com/apache/hadoop/pull/3440, getIsNamespaceEnabled is added 
> in client methods which checks if namespace information is there or not, and 
> if not there, it will make getAcl call and set the field. Once the field is 
> set, it would be used in future getIsNamespaceEnabled method calls for a 
> given AbfsClient.
> Since, CPK both global and encryptionContext are for hns account, the fix 
> that is proposed is that we would fail fs init if its non-hns account and cpk 
> config is given.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-19137) [ABFS]:Extra getAcl call while calling the very first API of FileSystem

2024-04-19 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena updated HADOOP-19137:
---
Description: 
Store doesn't flow in the namespace information to the client. 

In https://github.com/apache/hadoop/pull/3440, getIsNamespaceEnabled is added 
in client methods which checks if namespace information is there or not, and if 
not there, it will make getAcl call and set the field. Once the field is set, 
it would be used in future getIsNamespaceEnabled method calls for a given 
AbfsClient.

Since, CPK both global and encryptionContext are only for hns account, the fix 
that is proposed is that we would fail fs init if its non-hns account and cpk 
config is given.

  was:
Store doesn't flow in the namespace information to the client. 

In https://github.com/apache/hadoop/pull/3440, getIsNamespaceEnabled is added 
in client methods which checks if namespace information is there or not, and if 
not there, it will make getAcl call and set the field. Once the field is set, 
it would be used in future getIsNamespaceEnabled method calls for a given 
AbfsClient.

Since, CPK both global and encryptionContext are for hns account, the fix that 
is proposed is that we would fail fs init if its non-hns account and cpk config 
is given.


> [ABFS]:Extra getAcl call while calling the very first API of FileSystem
> ---
>
> Key: HADOOP-19137
> URL: https://issues.apache.org/jira/browse/HADOOP-19137
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>
> Store doesn't flow in the namespace information to the client. 
> In https://github.com/apache/hadoop/pull/3440, getIsNamespaceEnabled is added 
> in client methods which checks if namespace information is there or not, and 
> if not there, it will make getAcl call and set the field. Once the field is 
> set, it would be used in future getIsNamespaceEnabled method calls for a 
> given AbfsClient.
> Since, CPK both global and encryptionContext are only for hns account, the 
> fix that is proposed is that we would fail fs init if its non-hns account and 
> cpk config is given.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19139) [ABFS]: No GetPathStatus call for opening AbfsInputStream

2024-04-03 Thread Pranav Saxena (Jira)
Pranav Saxena created HADOOP-19139:
--

 Summary: [ABFS]: No GetPathStatus call for opening AbfsInputStream
 Key: HADOOP-19139
 URL: https://issues.apache.org/jira/browse/HADOOP-19139
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Reporter: Pranav Saxena
Assignee: Pranav Saxena


Read API gives contentLen and etag of the path. This information would be used 
in future calls on that inputStream. Prior information of eTag is of not much 
importance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19137) [ABFS]:Extra getAcl call while calling first API of FileSystem

2024-04-02 Thread Pranav Saxena (Jira)
Pranav Saxena created HADOOP-19137:
--

 Summary: [ABFS]:Extra getAcl call while calling first API of 
FileSystem
 Key: HADOOP-19137
 URL: https://issues.apache.org/jira/browse/HADOOP-19137
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.4.0
Reporter: Pranav Saxena
Assignee: Pranav Saxena


Store doesn't flow in the namespace information to the client. 

In https://github.com/apache/hadoop/pull/3440, getIsNamespaceEnabled is added 
in client methods which checks if namespace information is there or not, and if 
not there, it will make getAcl call and set the field. Once the field is set, 
it would be used in future getIsNamespaceEnabled method calls for a given 
AbfsClient.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-19137) [ABFS]:Extra getAcl call while calling the very first API of FileSystem

2024-04-02 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena updated HADOOP-19137:
---
Summary: [ABFS]:Extra getAcl call while calling the very first API of 
FileSystem  (was: [ABFS]:Extra getAcl call while calling first API of 
FileSystem)

> [ABFS]:Extra getAcl call while calling the very first API of FileSystem
> ---
>
> Key: HADOOP-19137
> URL: https://issues.apache.org/jira/browse/HADOOP-19137
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>
> Store doesn't flow in the namespace information to the client. 
> In https://github.com/apache/hadoop/pull/3440, getIsNamespaceEnabled is added 
> in client methods which checks if namespace information is there or not, and 
> if not there, it will make getAcl call and set the field. Once the field is 
> set, it would be used in future getIsNamespaceEnabled method calls for a 
> given AbfsClient.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-19120) [ABFS]: ApacheHttpClient adaptation as network library

2024-03-26 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena updated HADOOP-19120:
---
Description: 
Apache HttpClient is more feature-rich and flexible and gives application more 
granular control over networking parameter.

ABFS currently relies on the JDK-net library. This library is managed by 
OpenJDK and has no performance problem. However, it limits the application's 
control over networking, and there are very few APIs and hooks exposed that the 
application can use to get metrics, choose which and when a connection should 
be reused. ApacheHttpClient will give important hooks to fetch important 
metrics and control networking parameters.

A custom implementation of connection-pool is used. The implementation is 
adapted from the JDK8 connection pooling. Reasons for doing it:
1. PoolingHttpClientConnectionManager heuristic caches all the reusable 
connections it has created. JDK's implementation only caches limited number of 
connections. The limit is given by JVM system property "http.maxConnections". 
If there is no system-property, it defaults to 5. Connection-establishment 
latency increased with all the connections were cached. Hence, adapting the 
pooling heuristic of JDK netlib,
2. In PoolingHttpClientConnectionManager, it expects the application to provide 
`setMaxPerRoute` and `setMaxTotal`, which the implementation uses as the total 
number of connections it can create. For application using ABFS, it is not 
feasible to provide a value in the initialisation of the connectionManager. 
JDK's implementation has no cap on the number of connections it can have opened 
on a moment. Hence, adapting the pooling heuristic of JDK netlib,

  was:
Apache HttpClient is more feature-rich and flexible and gives application more 
granular control over networking parameter.

ABFS currently relies on the JDK-net library. This library is managed by 
OpenJDK and has no performance problem. However, it limits the application's 
control over networking, and there are very few APIs and hooks exposed that the 
application can use to get metrics, choose which and when a connection should 
be reused. ApacheHttpClient will give important hooks to fetch important 
metrics and control networking parameters.

A custom implementation of connection-pool is used. The implementation is 
adapted from the JDK8 connection pooling. Reasons for doing it:
1. PoolingHttpClientConnectionManager heuristic caches all the reusable 
connections it has created. JDK's implementation only caches limited number of 
connections. The limit is given by JVM system property "http.maxConnections". 
If there is no system-property, it defaults to 5. Connection-establishment 
latency increased with all the connections were cached. Hence, adapting the 
pooling heuristic of JDK netlib,
2. In PoolingHttpClientConnectionManager, it expects the application to provide 
`setMaxPerRoute` and `setMaxTotal`, which the implementation uses as the total 
number of connections it can create. For application using ABFS, it is not 
feasible to provide a value in the initialisation of the connectionManager. 
JDK's implementation has no cap on the number of connections it can have opened 
on a moment.


> [ABFS]: ApacheHttpClient adaptation as network library
> --
>
> Key: HADOOP-19120
> URL: https://issues.apache.org/jira/browse/HADOOP-19120
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.5.0
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
>
> Apache HttpClient is more feature-rich and flexible and gives application 
> more granular control over networking parameter.
> ABFS currently relies on the JDK-net library. This library is managed by 
> OpenJDK and has no performance problem. However, it limits the application's 
> control over networking, and there are very few APIs and hooks exposed that 
> the application can use to get metrics, choose which and when a connection 
> should be reused. ApacheHttpClient will give important hooks to fetch 
> important metrics and control networking parameters.
> A custom implementation of connection-pool is used. The implementation is 
> adapted from the JDK8 connection pooling. Reasons for doing it:
> 1. PoolingHttpClientConnectionManager heuristic caches all the reusable 
> connections it has created. JDK's implementation only caches limited number 
> of connections. The limit is given by JVM system property 
> "http.maxConnections". If there is no system-property, it defaults to 5. 
> Connection-establishment latency increased with all the connections were 
> cached. Hence, adapting the pooling heuristic of JDK netlib,
> 2. In 

[jira] [Updated] (HADOOP-19120) [ABFS]: ApacheHttpClient adaptation as network library

2024-03-26 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena updated HADOOP-19120:
---
Description: 
Apache HttpClient is more feature-rich and flexible and gives application more 
granular control over networking parameter.

ABFS currently relies on the JDK-net library. This library is managed by 
OpenJDK and has no performance problem. However, it limits the application's 
control over networking, and there are very few APIs and hooks exposed that the 
application can use to get metrics, choose which and when a connection should 
be reused. ApacheHttpClient will give important hooks to fetch important 
metrics and control networking parameters.

A custom implementation of connection-pool is used. The implementation is 
adapted from the JDK8 connection pooling. Reasons for doing it:
1. PoolingHttpClientConnectionManager heuristic caches all the reusable 
connections it has created. JDK's implementation only caches limited number of 
connections. The limit is given by JVM system property "http.maxConnections". 
If there is no system-property, it defaults to 5. Connection-establishment 
latency increased with all the connections were cached. Hence, adapting the 
pooling heuristic of JDK netlib,
2. In PoolingHttpClientConnectionManager, it expects the application to provide 
`setMaxPerRoute` and `setMaxTotal`, which the implementation uses as the total 
number of connections it can create. For application using ABFS, it is not 
feasible to provide a value in the initialisation of the connectionManager. 
JDK's implementation has no cap on the number of connections it can have opened 
on a moment.

  was:
Apache HttpClient is more feature-rich and flexible and gives application more 
granular control over networking parameter.

ABFS currently relies on the JDK-net library. This library is managed by 
OpenJDK and has no performance problem. However, it limits the application's 
control over networking, and there are very few APIs and hooks exposed that the 
application can use to get metrics, choose which and when a connection should 
be reused. ApacheHttpClient will give important hooks to fetch important 
metrics and control networking parameters.


> [ABFS]: ApacheHttpClient adaptation as network library
> --
>
> Key: HADOOP-19120
> URL: https://issues.apache.org/jira/browse/HADOOP-19120
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.5.0
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
>
> Apache HttpClient is more feature-rich and flexible and gives application 
> more granular control over networking parameter.
> ABFS currently relies on the JDK-net library. This library is managed by 
> OpenJDK and has no performance problem. However, it limits the application's 
> control over networking, and there are very few APIs and hooks exposed that 
> the application can use to get metrics, choose which and when a connection 
> should be reused. ApacheHttpClient will give important hooks to fetch 
> important metrics and control networking parameters.
> A custom implementation of connection-pool is used. The implementation is 
> adapted from the JDK8 connection pooling. Reasons for doing it:
> 1. PoolingHttpClientConnectionManager heuristic caches all the reusable 
> connections it has created. JDK's implementation only caches limited number 
> of connections. The limit is given by JVM system property 
> "http.maxConnections". If there is no system-property, it defaults to 5. 
> Connection-establishment latency increased with all the connections were 
> cached. Hence, adapting the pooling heuristic of JDK netlib,
> 2. In PoolingHttpClientConnectionManager, it expects the application to 
> provide `setMaxPerRoute` and `setMaxTotal`, which the implementation uses as 
> the total number of connections it can create. For application using ABFS, it 
> is not feasible to provide a value in the initialisation of the 
> connectionManager. JDK's implementation has no cap on the number of 
> connections it can have opened on a moment.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19120) [ABFS]: ApacheHttpClient adaptation as network library

2024-03-21 Thread Pranav Saxena (Jira)
Pranav Saxena created HADOOP-19120:
--

 Summary: [ABFS]: ApacheHttpClient adaptation as network library
 Key: HADOOP-19120
 URL: https://issues.apache.org/jira/browse/HADOOP-19120
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.5.0
Reporter: Pranav Saxena
Assignee: Pranav Saxena


Apache HttpClient is more feature-rich and flexible and gives application more 
granular control over networking parameter.

ABFS currently relies on the JDK-net library. This library is managed by 
OpenJDK and has no performance problem. However, it limits the application's 
control over networking, and there are very few APIs and hooks exposed that the 
application can use to get metrics, choose which and when a connection should 
be reused. ApacheHttpClient will give important hooks to fetch important 
metrics and control networking parameters.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-19102) [ABFS]: FooterReadBufferSize should not be greater than readBufferSize

2024-03-07 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena updated HADOOP-19102:
---
Description: 
The method `optimisedRead` creates a buffer array of size `readBufferSize`. If 
footerReadBufferSize is greater than readBufferSize, abfs will attempt to read 
more data than the buffer array can hold, which causes an exception.

Change: To avoid this, we will keep footerBufferSize = 
min(readBufferSizeConfig, footerBufferSizeConfig)

 

 

  was:
The method `optimisedRead` creates a buffer array of size `readBufferSize`. If 
footerReadBufferSize is greater than readBufferSize, abfs will attempt to read 
more data than the buffer array can hold, which causes an exception.

Change: To avoid this, we will assign readBufferSize to footerReadBufferSize 
when footerReadBufferSize is larger than readBufferSize.

 

 


> [ABFS]: FooterReadBufferSize should not be greater than readBufferSize
> --
>
> Key: HADOOP-19102
> URL: https://issues.apache.org/jira/browse/HADOOP-19102
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
> Fix For: 3.4.0, 3.5.0
>
>
> The method `optimisedRead` creates a buffer array of size `readBufferSize`. 
> If footerReadBufferSize is greater than readBufferSize, abfs will attempt to 
> read more data than the buffer array can hold, which causes an exception.
> Change: To avoid this, we will keep footerBufferSize = 
> min(readBufferSizeConfig, footerBufferSizeConfig)
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-19102) [ABFS]: FooterReadBufferSize should not be greater than readBufferSize

2024-03-06 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena updated HADOOP-19102:
---
Fix Version/s: 3.4.0
   3.5.0

> [ABFS]: FooterReadBufferSize should not be greater than readBufferSize
> --
>
> Key: HADOOP-19102
> URL: https://issues.apache.org/jira/browse/HADOOP-19102
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
> Fix For: 3.4.0, 3.5.0
>
>
> The method `optimisedRead` creates a buffer array of size `readBufferSize`. 
> If footerReadBufferSize is greater than readBufferSize, abfs will attempt to 
> read more data than the buffer array can hold, which causes an exception.
> Change: To avoid this, we will assign readBufferSize to footerReadBufferSize 
> when footerReadBufferSize is larger than readBufferSize.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19102) [ABFS]: FooterReadBufferSize should not be greater than readBufferSize

2024-03-06 Thread Pranav Saxena (Jira)
Pranav Saxena created HADOOP-19102:
--

 Summary: [ABFS]: FooterReadBufferSize should not be greater than 
readBufferSize
 Key: HADOOP-19102
 URL: https://issues.apache.org/jira/browse/HADOOP-19102
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Reporter: Pranav Saxena
Assignee: Pranav Saxena


The method `optimisedRead` creates a buffer array of size `readBufferSize`. If 
footerReadBufferSize is greater than readBufferSize, abfs will attempt to read 
more data than the buffer array can hold, which causes an exception.

Change: To avoid this, we will assign readBufferSize to footerReadBufferSize 
when footerReadBufferSize is larger than readBufferSize.

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Comment Edited] (HADOOP-18883) Expect-100 JDK bug resolution: prevent multiple server calls

2023-11-06 Thread Pranav Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783216#comment-17783216
 ] 

Pranav Saxena edited comment on HADOOP-18883 at 11/6/23 1:16 PM:
-

Hi [~ste...@apache.org].   Requesting you to kindly review the PR please. This 
is PR to prevent day-0 JDK bug around expect-100 in abfs. Would be really 
awesome to get your feedback on this. Thank you so much.


was (Author: pranavsaxena):
Hi [~ste...@apache.org]   Requesting you to kindly review the PR please. This 
is PR to prevent day-0 JDK bug around expect-100 in abfs. Would be really 
awesome to get your feedback on this. Thank you so much.

> Expect-100 JDK bug resolution: prevent multiple server calls
> 
>
> Key: HADOOP-18883
> URL: https://issues.apache.org/jira/browse/HADOOP-18883
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> This is inline to JDK bug: [https://bugs.openjdk.org/browse/JDK-8314978].
>  
> With the current implementation of HttpURLConnection if server rejects the 
> “Expect 100-continue” then there will be ‘java.net.ProtocolException’ will be 
> thrown from 'expect100Continue()' method.
> After the exception thrown, If we call any other method on the same instance 
> (ex getHeaderField(), or getHeaderFields()). They will internally call 
> getOuputStream() which invokes writeRequests(), which make the actual server 
> call. 
> In the AbfsHttpOperation, after sendRequest() we call processResponse() 
> method from AbfsRestOperation. Even if the conn.getOutputStream() fails due 
> to expect-100 error, we consume the exception and let the code go ahead. So, 
> we can have getHeaderField() / getHeaderFields() / getHeaderFieldLong() which 
> will be triggered after getOutputStream is failed. These invocation will lead 
> to server calls.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18883) Expect-100 JDK bug resolution: prevent multiple server calls

2023-11-06 Thread Pranav Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783216#comment-17783216
 ] 

Pranav Saxena commented on HADOOP-18883:


Hi [~ste...@apache.org]   Requesting you to kindly review the PR please. This 
is PR to prevent day-0 JDK bug around expect-100 in abfs. Would be really 
awesome to get your feedback on this. Thank you so much.

> Expect-100 JDK bug resolution: prevent multiple server calls
> 
>
> Key: HADOOP-18883
> URL: https://issues.apache.org/jira/browse/HADOOP-18883
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> This is inline to JDK bug: [https://bugs.openjdk.org/browse/JDK-8314978].
>  
> With the current implementation of HttpURLConnection if server rejects the 
> “Expect 100-continue” then there will be ‘java.net.ProtocolException’ will be 
> thrown from 'expect100Continue()' method.
> After the exception thrown, If we call any other method on the same instance 
> (ex getHeaderField(), or getHeaderFields()). They will internally call 
> getOuputStream() which invokes writeRequests(), which make the actual server 
> call. 
> In the AbfsHttpOperation, after sendRequest() we call processResponse() 
> method from AbfsRestOperation. Even if the conn.getOutputStream() fails due 
> to expect-100 error, we consume the exception and let the code go ahead. So, 
> we can have getHeaderField() / getHeaderFields() / getHeaderFieldLong() which 
> will be triggered after getOutputStream is failed. These invocation will lead 
> to server calls.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18960) ABFS contract-tests with Hadoop-Commons failing

2023-10-31 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena updated HADOOP-18960:
---
Fix Version/s: 3.4.0
   (was: 3.3.6)

> ABFS contract-tests with Hadoop-Commons failing
> ---
>
> Key: HADOOP-18960
> URL: https://issues.apache.org/jira/browse/HADOOP-18960
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Pranav Saxena
>Priority: Minor
> Fix For: 3.4.0
>
>
> In the merged pr [HADOOP-18869: [ABFS] Fixing Behavior of a File System APIs 
> on root path by anujmodi2021 · Pull Request #6003 · apache/hadoop 
> (github.com)|https://github.com/apache/hadoop/pull/6003], a config was 
> switched-on: `fs.contract.test.root-tests-enabled`. This enables the root 
> manipulation tests for the filesystem contract.
> Now, the execution of contract-tests in abfs works as per executionId 
> integration-test-abfs-parallel-classes of the pom. The tests would work in 
> different jvms, and at a given instance multiple such jvms could be there, 
> depending on ${testsThreadCount}.  The problem is that all the test jvms for 
> contract-test use the same container for test runs which is defined by 
> `fs.contract.test.fs.abfs`. Due to this, one jvm root-contract-runs can 
> influence other jvm's root-contract-runs. This leads to CI failures for 
> hadoop-azure package.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18960) ABFS contract-tests with Hadoop-Commons intermittently failing

2023-10-31 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena updated HADOOP-18960:
---
Summary: ABFS contract-tests with Hadoop-Commons intermittently failing  
(was: ABFS contract-tests with Hadoop-Commons failing)

> ABFS contract-tests with Hadoop-Commons intermittently failing
> --
>
> Key: HADOOP-18960
> URL: https://issues.apache.org/jira/browse/HADOOP-18960
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Pranav Saxena
>Priority: Minor
> Fix For: 3.4.0
>
>
> In the merged pr [HADOOP-18869: [ABFS] Fixing Behavior of a File System APIs 
> on root path by anujmodi2021 · Pull Request #6003 · apache/hadoop 
> (github.com)|https://github.com/apache/hadoop/pull/6003], a config was 
> switched-on: `fs.contract.test.root-tests-enabled`. This enables the root 
> manipulation tests for the filesystem contract.
> Now, the execution of contract-tests in abfs works as per executionId 
> integration-test-abfs-parallel-classes of the pom. The tests would work in 
> different jvms, and at a given instance multiple such jvms could be there, 
> depending on ${testsThreadCount}.  The problem is that all the test jvms for 
> contract-test use the same container for test runs which is defined by 
> `fs.contract.test.fs.abfs`. Due to this, one jvm root-contract-runs can 
> influence other jvm's root-contract-runs. This leads to CI failures for 
> hadoop-azure package.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18960) ABFS contract-tests with Hadoop-Commons failing

2023-10-31 Thread Pranav Saxena (Jira)
Pranav Saxena created HADOOP-18960:
--

 Summary: ABFS contract-tests with Hadoop-Commons failing
 Key: HADOOP-18960
 URL: https://issues.apache.org/jira/browse/HADOOP-18960
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Reporter: Pranav Saxena
 Fix For: 3.3.6


In the merged pr [HADOOP-18869: [ABFS] Fixing Behavior of a File System APIs on 
root path by anujmodi2021 · Pull Request #6003 · apache/hadoop 
(github.com)|https://github.com/apache/hadoop/pull/6003], a config was 
switched-on: `fs.contract.test.root-tests-enabled`. This enables the root 
manipulation tests for the filesystem contract.

Now, the execution of contract-tests in abfs works as per executionId 
integration-test-abfs-parallel-classes of the pom. The tests would work in 
different jvms, and at a given instance multiple such jvms could be there, 
depending on ${testsThreadCount}.  The problem is that all the test jvms for 
contract-test use the same container for test runs which is defined by 
`fs.contract.test.fs.abfs`. Due to this, one jvm root-contract-runs can 
influence other jvm's root-contract-runs. This leads to CI failures for 
hadoop-azure package.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18883) Expect-100 JDK bug resolution: prevent multiple server calls

2023-10-22 Thread Pranav Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17778522#comment-17778522
 ] 

Pranav Saxena commented on HADOOP-18883:


Hi [~ste...@apache.org] [~mehakmeetSingh] . Requesting you to kindly review the 
PR please. Thank you so much.

> Expect-100 JDK bug resolution: prevent multiple server calls
> 
>
> Key: HADOOP-18883
> URL: https://issues.apache.org/jira/browse/HADOOP-18883
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> This is inline to JDK bug: [https://bugs.openjdk.org/browse/JDK-8314978].
>  
> With the current implementation of HttpURLConnection if server rejects the 
> “Expect 100-continue” then there will be ‘java.net.ProtocolException’ will be 
> thrown from 'expect100Continue()' method.
> After the exception thrown, If we call any other method on the same instance 
> (ex getHeaderField(), or getHeaderFields()). They will internally call 
> getOuputStream() which invokes writeRequests(), which make the actual server 
> call. 
> In the AbfsHttpOperation, after sendRequest() we call processResponse() 
> method from AbfsRestOperation. Even if the conn.getOutputStream() fails due 
> to expect-100 error, we consume the exception and let the code go ahead. So, 
> we can have getHeaderField() / getHeaderFields() / getHeaderFieldLong() which 
> will be triggered after getOutputStream is failed. These invocation will lead 
> to server calls.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18883) Expect-100 JDK bug resolution: prevent multiple server calls

2023-09-13 Thread Pranav Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764986#comment-17764986
 ] 

Pranav Saxena commented on HADOOP-18883:


Hi [~ste...@apache.org],

This Jira is not related to HADOOP-18865. This Jira is a resolution on ABFS 
side for the JDK bug. This JDK bug has always been there. Since, it is 
discovered now, we want to have a resolution on our side.

 

Thank you.

> Expect-100 JDK bug resolution: prevent multiple server calls
> 
>
> Key: HADOOP-18883
> URL: https://issues.apache.org/jira/browse/HADOOP-18883
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
> Fix For: 3.4.0
>
>
> This is inline to JDK bug: [https://bugs.openjdk.org/browse/JDK-8314978].
>  
> With the current implementation of HttpURLConnection if server rejects the 
> “Expect 100-continue” then there will be ‘java.net.ProtocolException’ will be 
> thrown from 'expect100Continue()' method.
> After the exception thrown, If we call any other method on the same instance 
> (ex getHeaderField(), or getHeaderFields()). They will internally call 
> getOuputStream() which invokes writeRequests(), which make the actual server 
> call. 
> In the AbfsHttpOperation, after sendRequest() we call processResponse() 
> method from AbfsRestOperation. Even if the conn.getOutputStream() fails due 
> to expect-100 error, we consume the exception and let the code go ahead. So, 
> we can have getHeaderField() / getHeaderFields() / getHeaderFieldLong() which 
> will be triggered after getOutputStream is failed. These invocation will lead 
> to server calls.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18883) Expect-100 JDK bug resolution: prevent multiple server calls

2023-09-07 Thread Pranav Saxena (Jira)
Pranav Saxena created HADOOP-18883:
--

 Summary: Expect-100 JDK bug resolution: prevent multiple server 
calls
 Key: HADOOP-18883
 URL: https://issues.apache.org/jira/browse/HADOOP-18883
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Reporter: Pranav Saxena
Assignee: Pranav Saxena
 Fix For: 3.4.0


This is inline to JDK bug: [https://bugs.openjdk.org/browse/JDK-8314978].

 
With the current implementation of HttpURLConnection if server rejects the 
“Expect 100-continue” then there will be ‘java.net.ProtocolException’ will be 
thrown from 'expect100Continue()' method.

After the exception thrown, If we call any other method on the same instance 
(ex getHeaderField(), or getHeaderFields()). They will internally call 
getOuputStream() which invokes writeRequests(), which make the actual server 
call. 




In the AbfsHttpOperation, after sendRequest() we call processResponse() method 
from AbfsRestOperation. Even if the conn.getOutputStream() fails due to 
expect-100 error, we consume the exception and let the code go ahead. So, we 
can have getHeaderField() / getHeaderFields() / getHeaderFieldLong() which will 
be triggered after getOutputStream is failed. These invocation will lead to 
server calls.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] (HADOOP-18873) ABFS: AbfsOutputStream doesnt close DataBlocks object.

2023-09-01 Thread Pranav Saxena (Jira)


[ https://issues.apache.org/jira/browse/HADOOP-18873 ]


Pranav Saxena deleted comment on HADOOP-18873:


was (Author: pranavsaxena):
pr: [HADOOP-18873. ABFS: AbfsOutputStream doesnt close DataBlocks object. by 
saxenapranav · Pull Request #6010 · apache/hadoop 
(github.com)|https://github.com/apache/hadoop/pull/6010]

> ABFS: AbfsOutputStream doesnt close DataBlocks object.
> --
>
> Key: HADOOP-18873
> URL: https://issues.apache.org/jira/browse/HADOOP-18873
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.3.4
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
> Fix For: 3.3.4
>
>
> AbfsOutputStream doesnt close the dataBlock object created for the upload.
> What is the implication of not doing that:
> DataBlocks has three implementations:
>  # ByteArrayBlock
>  ## This creates an object of DataBlockByteArrayOutputStream (child of 
> ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading 
> the array.
>  ## This gets GCed.
>  # ByteBufferBlock:
>  ## There is a defined *DirectBufferPool* from which it tries to request the 
> directBuffer.
>  ## If nothing in the pool, a new directBuffer is created.
>  ## the `close` method on the this object has the responsiblity of returning 
> back the buffer to pool so it can be reused.
>  ## Since we are not calling the `close`:
>  ### The pool is rendered of less use, since each request creates a new 
> directBuffer from memory.
>  ### All the object can be GCed and the direct-memory allocated may be 
> returned on the GC. What if the process crashes, the memory never goes back 
> and cause memory issue on the machine.
>  # DiskBlock:
>  ## This creates a file on disk on which the data-to-upload is written. This 
> file gets deleted in startUpload().close().
>  
> startUpload() gives an object of BlockUploadData which gives method of 
> `toByteArray()` which is used in abfsOutputStream to get the byteArray in the 
> dataBlock.
>  
> Method which uses the DataBlock object: 
> https://github.com/apache/hadoop/blob/fac7d26c5d7f791565cc3ab45d079e2cca725f95/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java#L298



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18873) ABFS: AbfsOutputStream doesnt close DataBlocks object.

2023-09-01 Thread Pranav Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761263#comment-17761263
 ] 

Pranav Saxena commented on HADOOP-18873:


pr: [HADOOP-18873. ABFS: AbfsOutputStream doesnt close DataBlocks object. by 
saxenapranav · Pull Request #6010 · apache/hadoop 
(github.com)|https://github.com/apache/hadoop/pull/6010]

> ABFS: AbfsOutputStream doesnt close DataBlocks object.
> --
>
> Key: HADOOP-18873
> URL: https://issues.apache.org/jira/browse/HADOOP-18873
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.3.4
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
> Fix For: 3.3.4
>
>
> AbfsOutputStream doesnt close the dataBlock object created for the upload.
> What is the implication of not doing that:
> DataBlocks has three implementations:
>  # ByteArrayBlock
>  ## This creates an object of DataBlockByteArrayOutputStream (child of 
> ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading 
> the array.
>  ## This gets GCed.
>  # ByteBufferBlock:
>  ## There is a defined *DirectBufferPool* from which it tries to request the 
> directBuffer.
>  ## If nothing in the pool, a new directBuffer is created.
>  ## the `close` method on the this object has the responsiblity of returning 
> back the buffer to pool so it can be reused.
>  ## Since we are not calling the `close`:
>  ### The pool is rendered of less use, since each request creates a new 
> directBuffer from memory.
>  ### All the object can be GCed and the direct-memory allocated may be 
> returned on the GC. What if the process crashes, the memory never goes back 
> and cause memory issue on the machine.
>  # DiskBlock:
>  ## This creates a file on disk on which the data-to-upload is written. This 
> file gets deleted in startUpload().close().
>  
> startUpload() gives an object of BlockUploadData which gives method of 
> `toByteArray()` which is used in abfsOutputStream to get the byteArray in the 
> dataBlock.
>  
> Method which uses the DataBlock object: 
> https://github.com/apache/hadoop/blob/fac7d26c5d7f791565cc3ab45d079e2cca725f95/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java#L298



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18873) ABFS: AbfsOutputStream doesnt close DataBlocks object.

2023-08-30 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena updated HADOOP-18873:
---
Description: 
AbfsOutputStream doesnt close the dataBlock object created for the upload.

What is the implication of not doing that:
DataBlocks has three implementations:
 # ByteArrayBlock
 ## This creates an object of DataBlockByteArrayOutputStream (child of 
ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading the 
array.
 ## This gets GCed.
 # ByteBufferBlock:
 ## There is a defined *DirectBufferPool* from which it tries to request the 
directBuffer.
 ## If nothing in the pool, a new directBuffer is created.
 ## the `close` method on the this object has the responsiblity of returning 
back the buffer to pool so it can be reused.
 ## Since we are not calling the `close`:
 ### The pool is rendered of less use, since each request creates a new 
directBuffer from memory.
 ### All the object can be GCed and the direct-memory allocated may be returned 
on the GC. What if the process crashes, the memory never goes back and cause 
memory issue on the machine.
 # DiskBlock:
 ## This creates a file on disk on which the data-to-upload is written. This 
file gets deleted in startUpload().close().

 

startUpload() gives an object of BlockUploadData which gives method of 
`toByteArray()` which is used in abfsOutputStream to get the byteArray in the 
dataBlock.

 

Method which uses the DataBlock object: 
https://github.com/apache/hadoop/blob/fac7d26c5d7f791565cc3ab45d079e2cca725f95/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java#L298

  was:
AbfsOutputStream doesnt close the dataBlock object created for the upload.

What is the implication of not doing that:
DataBlocks has three implementations:
 # ByteArrayBlock
 ## This creates an object of DataBlockByteArrayOutputStream (child of 
ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading the 
array.
 ## This gets GCed.
 # ByteBufferBlock:
 ## There is a defined *DirectBufferPool* from which it tries to request the 
directBuffer.
 ## If nothing in the pool, a new directBuffer is created.
 ## the `close` method on the this object has the responsiblity of returning 
back the buffer to pool so it can be reused.
 ## Since we are not calling the `close`:
 ### The pool is rendered of less use, since each request creates a new 
directBuffer from memory.
 ### All the object can be GCed and the direct-memory allocated may be returned 
on the GC. What if the process crashes, the memory never goes back and cause 
memory issue on the machine.
 # DiskBlock:
 ## This creates a file on disk on which the data-to-upload is written. This 
file gets deleted in startUpload().close().

 

startUpload() gives an object of BlockUploadData which gives method of 
`toByteArray()` which is used in abfsOutputStream to get the byteArray in the 
dataBlock.


> ABFS: AbfsOutputStream doesnt close DataBlocks object.
> --
>
> Key: HADOOP-18873
> URL: https://issues.apache.org/jira/browse/HADOOP-18873
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.3.4
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
> Fix For: 3.3.4
>
>
> AbfsOutputStream doesnt close the dataBlock object created for the upload.
> What is the implication of not doing that:
> DataBlocks has three implementations:
>  # ByteArrayBlock
>  ## This creates an object of DataBlockByteArrayOutputStream (child of 
> ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading 
> the array.
>  ## This gets GCed.
>  # ByteBufferBlock:
>  ## There is a defined *DirectBufferPool* from which it tries to request the 
> directBuffer.
>  ## If nothing in the pool, a new directBuffer is created.
>  ## the `close` method on the this object has the responsiblity of returning 
> back the buffer to pool so it can be reused.
>  ## Since we are not calling the `close`:
>  ### The pool is rendered of less use, since each request creates a new 
> directBuffer from memory.
>  ### All the object can be GCed and the direct-memory allocated may be 
> returned on the GC. What if the process crashes, the memory never goes back 
> and cause memory issue on the machine.
>  # DiskBlock:
>  ## This creates a file on disk on which the data-to-upload is written. This 
> file gets deleted in startUpload().close().
>  
> startUpload() gives an object of BlockUploadData which gives method of 
> `toByteArray()` which is used in abfsOutputStream to get the byteArray in the 
> dataBlock.
>  
> Method which uses the DataBlock object: 
> 

[jira] [Updated] (HADOOP-18873) ABFS: AbfsOutputStream doesnt close DataBlocks object.

2023-08-30 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena updated HADOOP-18873:
---
Description: 
AbfsOutputStream doesnt close the dataBlock object created for the upload.

What is the implication of not doing that:
DataBlocks has three implementations:
 # ByteArrayBlock
 ## This creates an object of DataBlockByteArrayOutputStream (child of 
ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading the 
array.
 ## This gets GCed.
 # ByteBufferBlock:
 ## There is a defined *DirectBufferPool* from which it tries to request the 
directBuffer.
 ## If nothing in the pool, a new directBuffer is created.
 ## the `close` method on the this object has the responsiblity of returning 
back the buffer to pool so it can be reused.
 ## Since we are not calling the `close`:
 ### The pool is rendered of less use, since each request creates a new 
directBuffer from memory.
 ### All the object can be GCed and the direct-memory allocated may be returned 
on the GC. What if the process crashes, the memory never goes back and cause 
memory issue on the machine.
 # DiskBlock:
 ## This creates a file on disk on which the data-to-upload is written. This 
file gets deleted in startUpload().close().

 

startUpload() gives an object of BlockUploadData which gives method of 
`toByteArray()` which is used in abfsOutputStream to get the byteArray in the 
dataBlock.

  was:
AbfsOutputStream doesnt close the dataBlock object created for the upload.

What is the implication of not doing that:
DataBlocks has three implementations:
 # ByteArrayBlock
 ## This creates an object of DataBlockByteArrayOutputStream (child of 
ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading the 
array.
 ## This gets GCed.
 # ByteBufferBlock:
 ## There is a defined *DirectBufferPool* from which it tries to request the 
directBuffer.
 ## If nothing in the pool, a new directBuffer is created.
 ## the `close` method on the this object has the responsiblity of returning 
back the buffer to pool so it can be reused.
 ## Since we are not calling the `close`:
 ### The pool is rendered of less use, since each request creates a new 
directBuffer from memory.
 ### All the object can be GCed and the direct-memory allocated may be returned 
on the GC. What if the process crashes, the memory never goes back and cause 
memory issue on the machine.
 # DiskBlock:
 ## This creates a file on disk on which the data-to-upload is written. This 
file gets deleted in startUpload().close().


> ABFS: AbfsOutputStream doesnt close DataBlocks object.
> --
>
> Key: HADOOP-18873
> URL: https://issues.apache.org/jira/browse/HADOOP-18873
> Project: Hadoop Common
>  Issue Type: Sub-task
>Affects Versions: 3.3.4
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
> Fix For: 3.3.4
>
>
> AbfsOutputStream doesnt close the dataBlock object created for the upload.
> What is the implication of not doing that:
> DataBlocks has three implementations:
>  # ByteArrayBlock
>  ## This creates an object of DataBlockByteArrayOutputStream (child of 
> ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading 
> the array.
>  ## This gets GCed.
>  # ByteBufferBlock:
>  ## There is a defined *DirectBufferPool* from which it tries to request the 
> directBuffer.
>  ## If nothing in the pool, a new directBuffer is created.
>  ## the `close` method on the this object has the responsiblity of returning 
> back the buffer to pool so it can be reused.
>  ## Since we are not calling the `close`:
>  ### The pool is rendered of less use, since each request creates a new 
> directBuffer from memory.
>  ### All the object can be GCed and the direct-memory allocated may be 
> returned on the GC. What if the process crashes, the memory never goes back 
> and cause memory issue on the machine.
>  # DiskBlock:
>  ## This creates a file on disk on which the data-to-upload is written. This 
> file gets deleted in startUpload().close().
>  
> startUpload() gives an object of BlockUploadData which gives method of 
> `toByteArray()` which is used in abfsOutputStream to get the byteArray in the 
> dataBlock.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18873) ABFS: AbfsOutputStream doesnt close DataBlocks object.

2023-08-30 Thread Pranav Saxena (Jira)
Pranav Saxena created HADOOP-18873:
--

 Summary: ABFS: AbfsOutputStream doesnt close DataBlocks object.
 Key: HADOOP-18873
 URL: https://issues.apache.org/jira/browse/HADOOP-18873
 Project: Hadoop Common
  Issue Type: Sub-task
Affects Versions: 3.3.4
Reporter: Pranav Saxena
Assignee: Pranav Saxena
 Fix For: 3.3.4


AbfsOutputStream doesnt close the dataBlock object created for the upload.

What is the implication of not doing that:
DataBlocks has three implementations:
 # ByteArrayBlock
 ## This creates an object of DataBlockByteArrayOutputStream (child of 
ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading the 
array.
 ## This gets GCed.
 # ByteBufferBlock:
 ## There is a defined *DirectBufferPool* from which it tries to request the 
directBuffer.
 ## If nothing in the pool, a new directBuffer is created.
 ## the `close` method on the this object has the responsiblity of returning 
back the buffer to pool so it can be reused.
 ## Since we are not calling the `close`:
 ### The pool is rendered of less use, since each request creates a new 
directBuffer from memory.
 ### All the object can be GCed and the direct-memory allocated may be returned 
on the GC. What if the process crashes, the memory never goes back and cause 
memory issue on the machine.
 # DiskBlock:
 ## This creates a file on disk on which the data-to-upload is written. This 
file gets deleted in startUpload().close().



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18580) Special characters handling in Azure file system is different from others

2023-05-30 Thread Pranav Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17727787#comment-17727787
 ] 

Pranav Saxena commented on HADOOP-18580:


AbfsClient#createRequestUrl uses URLEncoder.encode(val, UTF_8).

{code:java}
String val = ".../part=a%25percent/test.parquet";
System.out.println(URLEncoder.encode(val, UTF_8));
{code}

gives:
`...%2Fpart%3Da%2525percent%2Ftest.parquet`

we can do something like:

{code:java}
URLEncoder.encode(val, UTF_8).replace("%2525", "%25")
{code}


> Special characters handling in Azure file system is different from others
> -
>
> Key: HADOOP-18580
> URL: https://issues.apache.org/jira/browse/HADOOP-18580
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.2.0
>Reporter: Yuya Ebihara
>Priority: Minor
>
> Special characters handling in 
> [AzureBlobFileSystem|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystem.java]
>  looks different from other file systems. e.g. GoogleHadoopFileSystem.
> For instance, FileSystem.open with ".../part=a%25percent/test.parquet" works 
> fine in other file systems, but ABFS requires 
> ".../part=a%percent/test.parquet" (%25 → %) because the path is URL encoded 
> by AbfsClient#createRequestUrl internally. Can we change the behavior? Or can 
> I request to add javadoc to explain the behavior if this is the expected 
> behavior? 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18606) Add reason in in x-ms-client-request-id on a retry API call.

2023-03-08 Thread Pranav Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697784#comment-17697784
 ] 

Pranav Saxena commented on HADOOP-18606:


PR for backport to branch-3.3: https://github.com/apache/hadoop/pull/5461

> Add reason in in x-ms-client-request-id on a retry API call.
> 
>
> Key: HADOOP-18606
> URL: https://issues.apache.org/jira/browse/HADOOP-18606
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> In the header, x-ms-client-request-id contains informaiton on what retry this 
> particular API call is: for ex: 
> :eb06d8f6-5693-461b-b63c-5858fa7655e6:29cb0d19-2b68-4409-bc35-cb7160b90dd8:::CF:1.
> We want to add the reason for the retry in the header_value:Now the same 
> header would include retry reason in case its not the 0th iteration of the 
> API operation. It would be like
> :eb06d8f6-5693-461b-b63c-5858fa7655e6:29cb0d19-2b68-4409-bc35-cb7160b90dd8:::CF:1_RT.
>  This corresponds that its retry number 1. The 0th iteration was failed due 
> to read timeout.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18647) x-ms-client-request-id to have some way that identifies retry of an API.

2023-02-27 Thread Pranav Saxena (Jira)
Pranav Saxena created HADOOP-18647:
--

 Summary: x-ms-client-request-id to have some way that identifies 
retry of an API.
 Key: HADOOP-18647
 URL: https://issues.apache.org/jira/browse/HADOOP-18647
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Reporter: Pranav Saxena
Assignee: Pranav Saxena
 Fix For: 3.4.0


In case primaryRequestId in x-ms-client-request-id is empty-string, the retry's 
primaryRequestId has to contain last part of clientRequestId UUID.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18547) Check if config value is not empty string in AbfsConfiguration.getMandatoryPasswordString()

2023-02-12 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena updated HADOOP-18547:
---
Description: 
The method `getMandatoryPasswordString` is called in 
`AbfsConfiguration.getTokenProvider()' to check if following configs are 
non-null (diff keys applicable for different implementation of 
AccessTokenProvider):

1. fs.azure.account.oauth2.client.endpoint: in ClientCredsTokenProvider
2. fs.azure.account.oauth2.client.id: in ClientCredsTokenProvider, 
MsiTokenProvider, RefreshTokenBasedTokenProvider
3. fs.azure.account.oauth2.client.secret: in ClientCredsTokenProvider
4. fs.azure.account.oauth2.client.endpoint: in UserPasswordTokenProvider
5. fs.azure.account.oauth2.user.name: in UserPasswordTokenProvider
6. fs.azure.account.oauth2.user.password: in  UserPasswordTokenProvider
7. fs.azure.account.oauth2.msi.tenant: in MsiTokenProvider
8. fs.azure.account.oauth2.refresh.token: in RefreshTokenBasedTokenProvider

Right now, this method checks if its non-null and not non-empty. This task 
needs to add check on non-empty config values.

> Check if config value is not empty string in 
> AbfsConfiguration.getMandatoryPasswordString()
> ---
>
> Key: HADOOP-18547
> URL: https://issues.apache.org/jira/browse/HADOOP-18547
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.3.4
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Minor
>  Labels: pull-request-available
>
> The method `getMandatoryPasswordString` is called in 
> `AbfsConfiguration.getTokenProvider()' to check if following configs are 
> non-null (diff keys applicable for different implementation of 
> AccessTokenProvider):
> 1. fs.azure.account.oauth2.client.endpoint: in ClientCredsTokenProvider
> 2. fs.azure.account.oauth2.client.id: in ClientCredsTokenProvider, 
> MsiTokenProvider, RefreshTokenBasedTokenProvider
> 3. fs.azure.account.oauth2.client.secret: in ClientCredsTokenProvider
> 4. fs.azure.account.oauth2.client.endpoint: in UserPasswordTokenProvider
> 5. fs.azure.account.oauth2.user.name: in UserPasswordTokenProvider
> 6. fs.azure.account.oauth2.user.password: in  UserPasswordTokenProvider
> 7. fs.azure.account.oauth2.msi.tenant: in MsiTokenProvider
> 8. fs.azure.account.oauth2.refresh.token: in RefreshTokenBasedTokenProvider
> Right now, this method checks if its non-null and not non-empty. This task 
> needs to add check on non-empty config values.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18606) Add reason in in x-ms-client-request-id on a retry API call.

2023-01-30 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena updated HADOOP-18606:
---
Fix Version/s: 3.4.0

> Add reason in in x-ms-client-request-id on a retry API call.
> 
>
> Key: HADOOP-18606
> URL: https://issues.apache.org/jira/browse/HADOOP-18606
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Minor
> Fix For: 3.4.0
>
>
> In the header, x-ms-client-request-id contains informaiton on what retry this 
> particular API call is: for ex: 
> :eb06d8f6-5693-461b-b63c-5858fa7655e6:29cb0d19-2b68-4409-bc35-cb7160b90dd8:::CF:1.
> We want to add the reason for the retry in the header_value:Now the same 
> header would include retry reason in case its not the 0th iteration of the 
> API operation. It would be like
> :eb06d8f6-5693-461b-b63c-5858fa7655e6:29cb0d19-2b68-4409-bc35-cb7160b90dd8:::CF:1_RT.
>  This corresponds that its retry number 1. The 0th iteration was failed due 
> to read timeout.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18606) Add reason in in x-ms-client-request-id on a retry API call.

2023-01-30 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena updated HADOOP-18606:
---
Component/s: fs/azure

> Add reason in in x-ms-client-request-id on a retry API call.
> 
>
> Key: HADOOP-18606
> URL: https://issues.apache.org/jira/browse/HADOOP-18606
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Minor
> Fix For: 3.4.0
>
>
> In the header, x-ms-client-request-id contains informaiton on what retry this 
> particular API call is: for ex: 
> :eb06d8f6-5693-461b-b63c-5858fa7655e6:29cb0d19-2b68-4409-bc35-cb7160b90dd8:::CF:1.
> We want to add the reason for the retry in the header_value:Now the same 
> header would include retry reason in case its not the 0th iteration of the 
> API operation. It would be like
> :eb06d8f6-5693-461b-b63c-5858fa7655e6:29cb0d19-2b68-4409-bc35-cb7160b90dd8:::CF:1_RT.
>  This corresponds that its retry number 1. The 0th iteration was failed due 
> to read timeout.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18606) Add reason in in x-ms-client-request-id on a retry API call.

2023-01-30 Thread Pranav Saxena (Jira)
Pranav Saxena created HADOOP-18606:
--

 Summary: Add reason in in x-ms-client-request-id on a retry API 
call.
 Key: HADOOP-18606
 URL: https://issues.apache.org/jira/browse/HADOOP-18606
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Pranav Saxena
Assignee: Pranav Saxena


In the header, x-ms-client-request-id contains informaiton on what retry this 
particular API call is: for ex: 
:eb06d8f6-5693-461b-b63c-5858fa7655e6:29cb0d19-2b68-4409-bc35-cb7160b90dd8:::CF:1.

We want to add the reason for the retry in the header_value:Now the same header 
would include retry reason in case its not the 0th iteration of the API 
operation. It would be like
:eb06d8f6-5693-461b-b63c-5858fa7655e6:29cb0d19-2b68-4409-bc35-cb7160b90dd8:::CF:1_RT.
 This corresponds that its retry number 1. The 0th iteration was failed due to 
read timeout.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17912) ABFS: Support for Encryption Context

2023-01-22 Thread Pranav Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679670#comment-17679670
 ] 

Pranav Saxena commented on HADOOP-17912:


[~mehakmeet] [~mthakur], requesting you to kindly review the PR. Thanks.

> ABFS: Support for Encryption Context
> 
>
> Key: HADOOP-17912
> URL: https://issues.apache.org/jira/browse/HADOOP-17912
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.1
>Reporter: Sumangala Patki
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Support for customer-provided encryption keys at the file level, superceding 
> the global (account-level) key use in HADOOP-17536.
> ABFS driver will support an "EncryptionContext" plugin for retrieving 
> encryption information, the implementation for which should be provided by 
> the client. The keys/context retrieved will be sent via request headers to 
> the server, which will store the encryption context. Subsequent REST calls to 
> server that access data/user metadata of the file will require fetching the 
> encryption context through a GetFileProperties call and retrieving the key 
> from the custom provider, before sending the request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17912) ABFS: Support for Encryption Context

2023-01-18 Thread Pranav Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17678488#comment-17678488
 ] 

Pranav Saxena commented on HADOOP-17912:


Hi [~ste...@apache.org], requesting you to kindly review the PR please.

Regards.

> ABFS: Support for Encryption Context
> 
>
> Key: HADOOP-17912
> URL: https://issues.apache.org/jira/browse/HADOOP-17912
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.1
>Reporter: Sumangala Patki
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Support for customer-provided encryption keys at the file level, superceding 
> the global (account-level) key use in HADOOP-17536.
> ABFS driver will support an "EncryptionContext" plugin for retrieving 
> encryption information, the implementation for which should be provided by 
> the client. The keys/context retrieved will be sent via request headers to 
> the server, which will store the encryption context. Subsequent REST calls to 
> server that access data/user metadata of the file will require fetching the 
> encryption context through a GetFileProperties call and retrieving the key 
> from the custom provider, before sending the request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-18546) disable purging list of in progress reads in abfs stream closed

2022-12-07 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena reassigned HADOOP-18546:
--

Assignee: Pranav Saxena  (was: Steve Loughran)

> disable purging list of in progress reads in abfs stream closed
> ---
>
> Key: HADOOP-18546
> URL: https://issues.apache.org/jira/browse/HADOOP-18546
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.4
>Reporter: Steve Loughran
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
>
> turn off the prune of in progress reads in 
> ReadBufferManager::purgeBuffersForStream
> this will ensure active prefetches for a closed stream complete. they wiill 
> then get to the completed list and hang around until evicted by timeout, but 
> at least prefetching will be safe.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18547) Check if config value is not empty string in AbfsConfiguration.getMandatoryPasswordString()

2022-11-30 Thread Pranav Saxena (Jira)
Pranav Saxena created HADOOP-18547:
--

 Summary: Check if config value is not empty string in 
AbfsConfiguration.getMandatoryPasswordString()
 Key: HADOOP-18547
 URL: https://issues.apache.org/jira/browse/HADOOP-18547
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/azure
Affects Versions: 3.3.4
Reporter: Pranav Saxena
Assignee: Pranav Saxena






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18501) [ABFS]: Partial Read should add to throttling metric

2022-11-04 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena updated HADOOP-18501:
---
Description: 
Error Description:
For partial read (due to account backend throttling), the ABFS driver retry but 
doesn't add up in the throttling metrics.
In case of partial read with connection-reset exception, ABFS driver retry for 
the full request and doesn't add up in throttling metrics.

Mitigation:
In case of partial read, Abfs Driver should retry for the remaining bytes and 
it should be added in throttling metrics.

  was:
Error Description:
For partial read (due to account backend throttling), the ABFS driver doesn't 
retry and doesn't add up in the throttling metrics.

Mitigation:
Abfs Driver should retry for the remaining bytes. Also, it should be added in 
throttling metrics.


> [ABFS]: Partial Read should add to throttling metric
> 
>
> Key: HADOOP-18501
> URL: https://issues.apache.org/jira/browse/HADOOP-18501
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.3.4
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Minor
>
> Error Description:
> For partial read (due to account backend throttling), the ABFS driver retry 
> but doesn't add up in the throttling metrics.
> In case of partial read with connection-reset exception, ABFS driver retry 
> for the full request and doesn't add up in throttling metrics.
> Mitigation:
> In case of partial read, Abfs Driver should retry for the remaining bytes and 
> it should be added in throttling metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18501) [ABFS]: Partial Read should add to throttling metric

2022-10-21 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena updated HADOOP-18501:
---
Component/s: fs/azure

> [ABFS]: Partial Read should add to throttling metric
> 
>
> Key: HADOOP-18501
> URL: https://issues.apache.org/jira/browse/HADOOP-18501
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Minor
>
> Error Description:
> For partial read (due to account backend throttling), the ABFS driver doesn't 
> retry and doesn't add up in the throttling metrics.
> Mitigation:
> Abfs Driver should retry for the remaining bytes. Also, it should be added in 
> throttling metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18501) [ABFS]: Partial Read should add to throttling metric

2022-10-21 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena updated HADOOP-18501:
---
Affects Version/s: 3.3.4

> [ABFS]: Partial Read should add to throttling metric
> 
>
> Key: HADOOP-18501
> URL: https://issues.apache.org/jira/browse/HADOOP-18501
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.3.4
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Minor
>
> Error Description:
> For partial read (due to account backend throttling), the ABFS driver doesn't 
> retry and doesn't add up in the throttling metrics.
> Mitigation:
> Abfs Driver should retry for the remaining bytes. Also, it should be added in 
> throttling metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-18501) [ABFS]: Partial Read should add to throttling metric

2022-10-20 Thread Pranav Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-18501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620950#comment-17620950
 ] 

Pranav Saxena commented on HADOOP-18501:


Branch WIP: 
https://github.com/pranavsaxena-microsoft/hadoop/tree/partialReadThrottle

> [ABFS]: Partial Read should add to throttling metric
> 
>
> Key: HADOOP-18501
> URL: https://issues.apache.org/jira/browse/HADOOP-18501
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Minor
>
> Error Description:
> For partial read (due to account backend throttling), the ABFS driver doesn't 
> retry and doesn't add up in the throttling metrics.
> Mitigation:
> Abfs Driver should retry for the remaining bytes. Also, it should be added in 
> throttling metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Updated] (HADOOP-18501) [ABFS]: Partial Read should add to throttling metric

2022-10-20 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena updated HADOOP-18501:
---
Description: 
Error Description:
For partial read (due to account backend throttling), the ABFS driver doesn't 
retry and doesn't add up in the throttling metrics.

Mitigation:
Abfs Driver should retry for the remaining bytes. Also, it should be added in 
throttling metrics.

  was:
Error Description:
At present, SAS Tokens generated from the Azure Portal may or may not contain a 
? as a prefix. SAS Tokens that contain the ? prefix will lead to an error in 
the driver due to a clash of query parameters. This leads to customers having 
to manually remove the ? prefix before passing the SAS Tokens.

Mitigation:
After receiving the SAS Token from the provider, check if any prefix ? is 
present or not. If present, remove it and pass the SAS Token.


> [ABFS]: Partial Read should add to throttling metric
> 
>
> Key: HADOOP-18501
> URL: https://issues.apache.org/jira/browse/HADOOP-18501
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Minor
>
> Error Description:
> For partial read (due to account backend throttling), the ABFS driver doesn't 
> retry and doesn't add up in the throttling metrics.
> Mitigation:
> Abfs Driver should retry for the remaining bytes. Also, it should be added in 
> throttling metrics.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18501) [ABFS]: Partial Read should add to throttling metric

2022-10-20 Thread Pranav Saxena (Jira)
Pranav Saxena created HADOOP-18501:
--

 Summary: [ABFS]: Partial Read should add to throttling metric
 Key: HADOOP-18501
 URL: https://issues.apache.org/jira/browse/HADOOP-18501
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Pranav Saxena
Assignee: Sree Bhattacharyya


Error Description:
At present, SAS Tokens generated from the Azure Portal may or may not contain a 
? as a prefix. SAS Tokens that contain the ? prefix will lead to an error in 
the driver due to a clash of query parameters. This leads to customers having 
to manually remove the ? prefix before passing the SAS Tokens.

Mitigation:
After receiving the SAS Token from the provider, check if any prefix ? is 
present or not. If present, remove it and pass the SAS Token.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-18501) [ABFS]: Partial Read should add to throttling metric

2022-10-20 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena reassigned HADOOP-18501:
--

Assignee: Pranav Saxena  (was: Sree Bhattacharyya)

> [ABFS]: Partial Read should add to throttling metric
> 
>
> Key: HADOOP-18501
> URL: https://issues.apache.org/jira/browse/HADOOP-18501
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Minor
>
> Error Description:
> At present, SAS Tokens generated from the Azure Portal may or may not contain 
> a ? as a prefix. SAS Tokens that contain the ? prefix will lead to an error 
> in the driver due to a clash of query parameters. This leads to customers 
> having to manually remove the ? prefix before passing the SAS Tokens.
> Mitigation:
> After receiving the SAS Token from the provider, check if any prefix ? is 
> present or not. If present, remove it and pass the SAS Token.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Commented] (HADOOP-17912) ABFS: Support for Encryption Context

2022-10-05 Thread Pranav Saxena (Jira)


[ 
https://issues.apache.org/jira/browse/HADOOP-17912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17613289#comment-17613289
 ] 

Pranav Saxena commented on HADOOP-17912:


[~ste...@apache.org], requesting you to kindly review the PR: 
https://github.com/apache/hadoop/pull/3440. Thanks.

> ABFS: Support for Encryption Context
> 
>
> Key: HADOOP-17912
> URL: https://issues.apache.org/jira/browse/HADOOP-17912
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.1
>Reporter: Sumangala Patki
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Support for customer-provided encryption keys at the file level, superceding 
> the global (account-level) key use in HADOOP-17536.
> ABFS driver will support an "EncryptionContext" plugin for retrieving 
> encryption information, the implementation for which should be provided by 
> the client. The keys/context retrieved will be sent via request headers to 
> the server, which will store the encryption context. Subsequent REST calls to 
> server that access data/user metadata of the file will require fetching the 
> encryption context through a GetFileProperties call and retrieving the key 
> from the custom provider, before sending the request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-18408) [ABFS]: Ignore run of ITestAbfsRenameStageFailure for NonHNS-SharedKey configuration

2022-08-18 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena reassigned HADOOP-18408:
--

Assignee: Sree Bhattacharyya  (was: Pranav Saxena)

> [ABFS]: Ignore run of ITestAbfsRenameStageFailure for NonHNS-SharedKey 
> configuration
> 
>
> Key: HADOOP-18408
> URL: https://issues.apache.org/jira/browse/HADOOP-18408
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure, test
>Reporter: Pranav Saxena
>Assignee: Sree Bhattacharyya
>Priority: Minor
>  Labels: pull-request-available
>
> ITestAbfsRenameStageFailure fails for NonHNS-SharedKey configuration.
> Failure:
> [ERROR]   
> ITestAbfsRenameStageFailure>TestRenameStageFailure.testResilienceAsExpected:126
>  [resilient commit support] expected:<[tru]e> but was:<[fals]e>
> RCA:
> ResilientCommit looks for whether etags are preserved in rename, if not then 
> it throws an exception and the flag for resilientCommitByRename stays null, 
> leading ultimately to the test failure
> Mitigation:
> Since, etags are not preserved in the case of rename in nonHNS account, test 
> run for nonHNS account is not valid case. Hence, as part of this task, we 
> shall ignore this test for nonHNS configuraiton.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Created] (HADOOP-18408) [ABFS]: Ignore run of ITestAbfsRenameStageFailure for NonHNS-SharedKey configuration

2022-08-17 Thread Pranav Saxena (Jira)
Pranav Saxena created HADOOP-18408:
--

 Summary: [ABFS]: Ignore run of ITestAbfsRenameStageFailure for 
NonHNS-SharedKey configuration
 Key: HADOOP-18408
 URL: https://issues.apache.org/jira/browse/HADOOP-18408
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/azure
Reporter: Pranav Saxena
Assignee: Pranav Saxena


ITestAbfsRenameStageFailure fails for NonHNS-SharedKey configuration.

Failure:
[ERROR]   
ITestAbfsRenameStageFailure>TestRenameStageFailure.testResilienceAsExpected:126 
[resilient commit support] expected:<[tru]e> but was:<[fals]e>

RCA:
ResilientCommit looks for whether etags are preserved in rename, if not then it 
throws an exception and the flag for resilientCommitByRename stays null, 
leading ultimately to the test failure

Mitigation:
Since, etags are not preserved in the case of rename in nonHNS account, test 
run for nonHNS account is not valid case. Hence, as part of this task, we shall 
ignore this test for nonHNS configuraiton.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org



[jira] [Assigned] (HADOOP-17912) ABFS: Support for Encryption Context

2022-07-13 Thread Pranav Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Pranav Saxena reassigned HADOOP-17912:
--

Assignee: Pranav Saxena  (was: Sumangala Patki)

> ABFS: Support for Encryption Context
> 
>
> Key: HADOOP-17912
> URL: https://issues.apache.org/jira/browse/HADOOP-17912
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.1
>Reporter: Sumangala Patki
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Support for customer-provided encryption keys at the file level, superceding 
> the global (account-level) key use in HADOOP-17536.
> ABFS driver will support an "EncryptionContext" plugin for retrieving 
> encryption information, the implementation for which should be provided by 
> the client. The keys/context retrieved will be sent via request headers to 
> the server, which will store the encryption context. Subsequent REST calls to 
> server that access data/user metadata of the file will require fetching the 
> encryption context through a GetFileProperties call and retrieving the key 
> from the custom provider, before sending the request.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org