[jira] [Updated] (HADOOP-19120) [ABFS]: ApacheHttpClient adaptation as network library
[ https://issues.apache.org/jira/browse/HADOOP-19120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena updated HADOOP-19120: --- Status: Patch Available (was: Open) https://github.com/apache/hadoop/pull/6633 > [ABFS]: ApacheHttpClient adaptation as network library > -- > > Key: HADOOP-19120 > URL: https://issues.apache.org/jira/browse/HADOOP-19120 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.5.0 >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > Apache HttpClient is more feature-rich and flexible and gives application > more granular control over networking parameter. > ABFS currently relies on the JDK-net library. This library is managed by > OpenJDK and has no performance problem. However, it limits the application's > control over networking, and there are very few APIs and hooks exposed that > the application can use to get metrics, choose which and when a connection > should be reused. ApacheHttpClient will give important hooks to fetch > important metrics and control networking parameters. > A custom implementation of connection-pool is used. The implementation is > adapted from the JDK8 connection pooling. Reasons for doing it: > 1. PoolingHttpClientConnectionManager heuristic caches all the reusable > connections it has created. JDK's implementation only caches limited number > of connections. The limit is given by JVM system property > "http.maxConnections". If there is no system-property, it defaults to 5. > Connection-establishment latency increased with all the connections were > cached. Hence, adapting the pooling heuristic of JDK netlib, > 2. In PoolingHttpClientConnectionManager, it expects the application to > provide `setMaxPerRoute` and `setMaxTotal`, which the implementation uses as > the total number of connections it can create. For application using ABFS, it > is not feasible to provide a value in the initialisation of the > connectionManager. JDK's implementation has no cap on the number of > connections it can have opened on a moment. Hence, adapting the pooling > heuristic of JDK netlib, -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-19120) [ABFS]: ApacheHttpClient adaptation as network library
[ https://issues.apache.org/jira/browse/HADOOP-19120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena updated HADOOP-19120: --- Fix Version/s: 3.5.0 3.4.1 > [ABFS]: ApacheHttpClient adaptation as network library > -- > > Key: HADOOP-19120 > URL: https://issues.apache.org/jira/browse/HADOOP-19120 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.5.0 >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > Apache HttpClient is more feature-rich and flexible and gives application > more granular control over networking parameter. > ABFS currently relies on the JDK-net library. This library is managed by > OpenJDK and has no performance problem. However, it limits the application's > control over networking, and there are very few APIs and hooks exposed that > the application can use to get metrics, choose which and when a connection > should be reused. ApacheHttpClient will give important hooks to fetch > important metrics and control networking parameters. > A custom implementation of connection-pool is used. The implementation is > adapted from the JDK8 connection pooling. Reasons for doing it: > 1. PoolingHttpClientConnectionManager heuristic caches all the reusable > connections it has created. JDK's implementation only caches limited number > of connections. The limit is given by JVM system property > "http.maxConnections". If there is no system-property, it defaults to 5. > Connection-establishment latency increased with all the connections were > cached. Hence, adapting the pooling heuristic of JDK netlib, > 2. In PoolingHttpClientConnectionManager, it expects the application to > provide `setMaxPerRoute` and `setMaxTotal`, which the implementation uses as > the total number of connections it can create. For application using ABFS, it > is not feasible to provide a value in the initialisation of the > connectionManager. JDK's implementation has no cap on the number of > connections it can have opened on a moment. Hence, adapting the pooling > heuristic of JDK netlib, -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-19137) [ABFS]:Extra getAcl call while calling the very first API of FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-19137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena updated HADOOP-19137: --- Description: Store doesn't flow in the namespace information to the client. In https://github.com/apache/hadoop/pull/6221, getIsNamespaceEnabled is added in client methods which checks if namespace information is there or not, and if not there, it will make getAcl call and set the field. Once the field is set, it would be used in future getIsNamespaceEnabled method calls for a given AbfsClient. Since, CPK both global and encryptionContext are only for hns account, the fix that is proposed is that we would fail fs init if its non-hns account and cpk config is given. was: Store doesn't flow in the namespace information to the client. In https://github.com/apache/hadoop/pull/3440, getIsNamespaceEnabled is added in client methods which checks if namespace information is there or not, and if not there, it will make getAcl call and set the field. Once the field is set, it would be used in future getIsNamespaceEnabled method calls for a given AbfsClient. Since, CPK both global and encryptionContext are only for hns account, the fix that is proposed is that we would fail fs init if its non-hns account and cpk config is given. > [ABFS]:Extra getAcl call while calling the very first API of FileSystem > --- > > Key: HADOOP-19137 > URL: https://issues.apache.org/jira/browse/HADOOP-19137 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.4.0 >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Labels: pull-request-available > > Store doesn't flow in the namespace information to the client. > In https://github.com/apache/hadoop/pull/6221, getIsNamespaceEnabled is added > in client methods which checks if namespace information is there or not, and > if not there, it will make getAcl call and set the field. Once the field is > set, it would be used in future getIsNamespaceEnabled method calls for a > given AbfsClient. > Since, CPK both global and encryptionContext are only for hns account, the > fix that is proposed is that we would fail fs init if its non-hns account and > cpk config is given. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-19137) [ABFS]:Extra getAcl call while calling the very first API of FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-19137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena updated HADOOP-19137: --- Description: Store doesn't flow in the namespace information to the client. In https://github.com/apache/hadoop/pull/3440, getIsNamespaceEnabled is added in client methods which checks if namespace information is there or not, and if not there, it will make getAcl call and set the field. Once the field is set, it would be used in future getIsNamespaceEnabled method calls for a given AbfsClient. Since, CPK both global and encryptionContext are for hns account, the fix that is proposed is that we would fail fs init if its non-hns account and cpk config is given. was: Store doesn't flow in the namespace information to the client. In https://github.com/apache/hadoop/pull/3440, getIsNamespaceEnabled is added in client methods which checks if namespace information is there or not, and if not there, it will make getAcl call and set the field. Once the field is set, it would be used in future getIsNamespaceEnabled method calls for a given AbfsClient. > [ABFS]:Extra getAcl call while calling the very first API of FileSystem > --- > > Key: HADOOP-19137 > URL: https://issues.apache.org/jira/browse/HADOOP-19137 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.4.0 >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > > Store doesn't flow in the namespace information to the client. > In https://github.com/apache/hadoop/pull/3440, getIsNamespaceEnabled is added > in client methods which checks if namespace information is there or not, and > if not there, it will make getAcl call and set the field. Once the field is > set, it would be used in future getIsNamespaceEnabled method calls for a > given AbfsClient. > Since, CPK both global and encryptionContext are for hns account, the fix > that is proposed is that we would fail fs init if its non-hns account and cpk > config is given. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-19137) [ABFS]:Extra getAcl call while calling the very first API of FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-19137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena updated HADOOP-19137: --- Description: Store doesn't flow in the namespace information to the client. In https://github.com/apache/hadoop/pull/3440, getIsNamespaceEnabled is added in client methods which checks if namespace information is there or not, and if not there, it will make getAcl call and set the field. Once the field is set, it would be used in future getIsNamespaceEnabled method calls for a given AbfsClient. Since, CPK both global and encryptionContext are only for hns account, the fix that is proposed is that we would fail fs init if its non-hns account and cpk config is given. was: Store doesn't flow in the namespace information to the client. In https://github.com/apache/hadoop/pull/3440, getIsNamespaceEnabled is added in client methods which checks if namespace information is there or not, and if not there, it will make getAcl call and set the field. Once the field is set, it would be used in future getIsNamespaceEnabled method calls for a given AbfsClient. Since, CPK both global and encryptionContext are for hns account, the fix that is proposed is that we would fail fs init if its non-hns account and cpk config is given. > [ABFS]:Extra getAcl call while calling the very first API of FileSystem > --- > > Key: HADOOP-19137 > URL: https://issues.apache.org/jira/browse/HADOOP-19137 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.4.0 >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > > Store doesn't flow in the namespace information to the client. > In https://github.com/apache/hadoop/pull/3440, getIsNamespaceEnabled is added > in client methods which checks if namespace information is there or not, and > if not there, it will make getAcl call and set the field. Once the field is > set, it would be used in future getIsNamespaceEnabled method calls for a > given AbfsClient. > Since, CPK both global and encryptionContext are only for hns account, the > fix that is proposed is that we would fail fs init if its non-hns account and > cpk config is given. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19139) [ABFS]: No GetPathStatus call for opening AbfsInputStream
Pranav Saxena created HADOOP-19139: -- Summary: [ABFS]: No GetPathStatus call for opening AbfsInputStream Key: HADOOP-19139 URL: https://issues.apache.org/jira/browse/HADOOP-19139 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Reporter: Pranav Saxena Assignee: Pranav Saxena Read API gives contentLen and etag of the path. This information would be used in future calls on that inputStream. Prior information of eTag is of not much importance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19137) [ABFS]:Extra getAcl call while calling first API of FileSystem
Pranav Saxena created HADOOP-19137: -- Summary: [ABFS]:Extra getAcl call while calling first API of FileSystem Key: HADOOP-19137 URL: https://issues.apache.org/jira/browse/HADOOP-19137 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Affects Versions: 3.4.0 Reporter: Pranav Saxena Assignee: Pranav Saxena Store doesn't flow in the namespace information to the client. In https://github.com/apache/hadoop/pull/3440, getIsNamespaceEnabled is added in client methods which checks if namespace information is there or not, and if not there, it will make getAcl call and set the field. Once the field is set, it would be used in future getIsNamespaceEnabled method calls for a given AbfsClient. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-19137) [ABFS]:Extra getAcl call while calling the very first API of FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-19137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena updated HADOOP-19137: --- Summary: [ABFS]:Extra getAcl call while calling the very first API of FileSystem (was: [ABFS]:Extra getAcl call while calling first API of FileSystem) > [ABFS]:Extra getAcl call while calling the very first API of FileSystem > --- > > Key: HADOOP-19137 > URL: https://issues.apache.org/jira/browse/HADOOP-19137 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.4.0 >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > > Store doesn't flow in the namespace information to the client. > In https://github.com/apache/hadoop/pull/3440, getIsNamespaceEnabled is added > in client methods which checks if namespace information is there or not, and > if not there, it will make getAcl call and set the field. Once the field is > set, it would be used in future getIsNamespaceEnabled method calls for a > given AbfsClient. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-19120) [ABFS]: ApacheHttpClient adaptation as network library
[ https://issues.apache.org/jira/browse/HADOOP-19120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena updated HADOOP-19120: --- Description: Apache HttpClient is more feature-rich and flexible and gives application more granular control over networking parameter. ABFS currently relies on the JDK-net library. This library is managed by OpenJDK and has no performance problem. However, it limits the application's control over networking, and there are very few APIs and hooks exposed that the application can use to get metrics, choose which and when a connection should be reused. ApacheHttpClient will give important hooks to fetch important metrics and control networking parameters. A custom implementation of connection-pool is used. The implementation is adapted from the JDK8 connection pooling. Reasons for doing it: 1. PoolingHttpClientConnectionManager heuristic caches all the reusable connections it has created. JDK's implementation only caches limited number of connections. The limit is given by JVM system property "http.maxConnections". If there is no system-property, it defaults to 5. Connection-establishment latency increased with all the connections were cached. Hence, adapting the pooling heuristic of JDK netlib, 2. In PoolingHttpClientConnectionManager, it expects the application to provide `setMaxPerRoute` and `setMaxTotal`, which the implementation uses as the total number of connections it can create. For application using ABFS, it is not feasible to provide a value in the initialisation of the connectionManager. JDK's implementation has no cap on the number of connections it can have opened on a moment. Hence, adapting the pooling heuristic of JDK netlib, was: Apache HttpClient is more feature-rich and flexible and gives application more granular control over networking parameter. ABFS currently relies on the JDK-net library. This library is managed by OpenJDK and has no performance problem. However, it limits the application's control over networking, and there are very few APIs and hooks exposed that the application can use to get metrics, choose which and when a connection should be reused. ApacheHttpClient will give important hooks to fetch important metrics and control networking parameters. A custom implementation of connection-pool is used. The implementation is adapted from the JDK8 connection pooling. Reasons for doing it: 1. PoolingHttpClientConnectionManager heuristic caches all the reusable connections it has created. JDK's implementation only caches limited number of connections. The limit is given by JVM system property "http.maxConnections". If there is no system-property, it defaults to 5. Connection-establishment latency increased with all the connections were cached. Hence, adapting the pooling heuristic of JDK netlib, 2. In PoolingHttpClientConnectionManager, it expects the application to provide `setMaxPerRoute` and `setMaxTotal`, which the implementation uses as the total number of connections it can create. For application using ABFS, it is not feasible to provide a value in the initialisation of the connectionManager. JDK's implementation has no cap on the number of connections it can have opened on a moment. > [ABFS]: ApacheHttpClient adaptation as network library > -- > > Key: HADOOP-19120 > URL: https://issues.apache.org/jira/browse/HADOOP-19120 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.5.0 >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Labels: pull-request-available > > Apache HttpClient is more feature-rich and flexible and gives application > more granular control over networking parameter. > ABFS currently relies on the JDK-net library. This library is managed by > OpenJDK and has no performance problem. However, it limits the application's > control over networking, and there are very few APIs and hooks exposed that > the application can use to get metrics, choose which and when a connection > should be reused. ApacheHttpClient will give important hooks to fetch > important metrics and control networking parameters. > A custom implementation of connection-pool is used. The implementation is > adapted from the JDK8 connection pooling. Reasons for doing it: > 1. PoolingHttpClientConnectionManager heuristic caches all the reusable > connections it has created. JDK's implementation only caches limited number > of connections. The limit is given by JVM system property > "http.maxConnections". If there is no system-property, it defaults to 5. > Connection-establishment latency increased with all the connections were > cached. Hence, adapting the pooling heuristic of JDK netlib, > 2. In
[jira] [Updated] (HADOOP-19120) [ABFS]: ApacheHttpClient adaptation as network library
[ https://issues.apache.org/jira/browse/HADOOP-19120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena updated HADOOP-19120: --- Description: Apache HttpClient is more feature-rich and flexible and gives application more granular control over networking parameter. ABFS currently relies on the JDK-net library. This library is managed by OpenJDK and has no performance problem. However, it limits the application's control over networking, and there are very few APIs and hooks exposed that the application can use to get metrics, choose which and when a connection should be reused. ApacheHttpClient will give important hooks to fetch important metrics and control networking parameters. A custom implementation of connection-pool is used. The implementation is adapted from the JDK8 connection pooling. Reasons for doing it: 1. PoolingHttpClientConnectionManager heuristic caches all the reusable connections it has created. JDK's implementation only caches limited number of connections. The limit is given by JVM system property "http.maxConnections". If there is no system-property, it defaults to 5. Connection-establishment latency increased with all the connections were cached. Hence, adapting the pooling heuristic of JDK netlib, 2. In PoolingHttpClientConnectionManager, it expects the application to provide `setMaxPerRoute` and `setMaxTotal`, which the implementation uses as the total number of connections it can create. For application using ABFS, it is not feasible to provide a value in the initialisation of the connectionManager. JDK's implementation has no cap on the number of connections it can have opened on a moment. was: Apache HttpClient is more feature-rich and flexible and gives application more granular control over networking parameter. ABFS currently relies on the JDK-net library. This library is managed by OpenJDK and has no performance problem. However, it limits the application's control over networking, and there are very few APIs and hooks exposed that the application can use to get metrics, choose which and when a connection should be reused. ApacheHttpClient will give important hooks to fetch important metrics and control networking parameters. > [ABFS]: ApacheHttpClient adaptation as network library > -- > > Key: HADOOP-19120 > URL: https://issues.apache.org/jira/browse/HADOOP-19120 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.5.0 >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Labels: pull-request-available > > Apache HttpClient is more feature-rich and flexible and gives application > more granular control over networking parameter. > ABFS currently relies on the JDK-net library. This library is managed by > OpenJDK and has no performance problem. However, it limits the application's > control over networking, and there are very few APIs and hooks exposed that > the application can use to get metrics, choose which and when a connection > should be reused. ApacheHttpClient will give important hooks to fetch > important metrics and control networking parameters. > A custom implementation of connection-pool is used. The implementation is > adapted from the JDK8 connection pooling. Reasons for doing it: > 1. PoolingHttpClientConnectionManager heuristic caches all the reusable > connections it has created. JDK's implementation only caches limited number > of connections. The limit is given by JVM system property > "http.maxConnections". If there is no system-property, it defaults to 5. > Connection-establishment latency increased with all the connections were > cached. Hence, adapting the pooling heuristic of JDK netlib, > 2. In PoolingHttpClientConnectionManager, it expects the application to > provide `setMaxPerRoute` and `setMaxTotal`, which the implementation uses as > the total number of connections it can create. For application using ABFS, it > is not feasible to provide a value in the initialisation of the > connectionManager. JDK's implementation has no cap on the number of > connections it can have opened on a moment. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19120) [ABFS]: ApacheHttpClient adaptation as network library
Pranav Saxena created HADOOP-19120: -- Summary: [ABFS]: ApacheHttpClient adaptation as network library Key: HADOOP-19120 URL: https://issues.apache.org/jira/browse/HADOOP-19120 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Affects Versions: 3.5.0 Reporter: Pranav Saxena Assignee: Pranav Saxena Apache HttpClient is more feature-rich and flexible and gives application more granular control over networking parameter. ABFS currently relies on the JDK-net library. This library is managed by OpenJDK and has no performance problem. However, it limits the application's control over networking, and there are very few APIs and hooks exposed that the application can use to get metrics, choose which and when a connection should be reused. ApacheHttpClient will give important hooks to fetch important metrics and control networking parameters. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-19102) [ABFS]: FooterReadBufferSize should not be greater than readBufferSize
[ https://issues.apache.org/jira/browse/HADOOP-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena updated HADOOP-19102: --- Description: The method `optimisedRead` creates a buffer array of size `readBufferSize`. If footerReadBufferSize is greater than readBufferSize, abfs will attempt to read more data than the buffer array can hold, which causes an exception. Change: To avoid this, we will keep footerBufferSize = min(readBufferSizeConfig, footerBufferSizeConfig) was: The method `optimisedRead` creates a buffer array of size `readBufferSize`. If footerReadBufferSize is greater than readBufferSize, abfs will attempt to read more data than the buffer array can hold, which causes an exception. Change: To avoid this, we will assign readBufferSize to footerReadBufferSize when footerReadBufferSize is larger than readBufferSize. > [ABFS]: FooterReadBufferSize should not be greater than readBufferSize > -- > > Key: HADOOP-19102 > URL: https://issues.apache.org/jira/browse/HADOOP-19102 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Fix For: 3.4.0, 3.5.0 > > > The method `optimisedRead` creates a buffer array of size `readBufferSize`. > If footerReadBufferSize is greater than readBufferSize, abfs will attempt to > read more data than the buffer array can hold, which causes an exception. > Change: To avoid this, we will keep footerBufferSize = > min(readBufferSizeConfig, footerBufferSizeConfig) > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-19102) [ABFS]: FooterReadBufferSize should not be greater than readBufferSize
[ https://issues.apache.org/jira/browse/HADOOP-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena updated HADOOP-19102: --- Fix Version/s: 3.4.0 3.5.0 > [ABFS]: FooterReadBufferSize should not be greater than readBufferSize > -- > > Key: HADOOP-19102 > URL: https://issues.apache.org/jira/browse/HADOOP-19102 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Fix For: 3.4.0, 3.5.0 > > > The method `optimisedRead` creates a buffer array of size `readBufferSize`. > If footerReadBufferSize is greater than readBufferSize, abfs will attempt to > read more data than the buffer array can hold, which causes an exception. > Change: To avoid this, we will assign readBufferSize to footerReadBufferSize > when footerReadBufferSize is larger than readBufferSize. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19102) [ABFS]: FooterReadBufferSize should not be greater than readBufferSize
Pranav Saxena created HADOOP-19102: -- Summary: [ABFS]: FooterReadBufferSize should not be greater than readBufferSize Key: HADOOP-19102 URL: https://issues.apache.org/jira/browse/HADOOP-19102 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Reporter: Pranav Saxena Assignee: Pranav Saxena The method `optimisedRead` creates a buffer array of size `readBufferSize`. If footerReadBufferSize is greater than readBufferSize, abfs will attempt to read more data than the buffer array can hold, which causes an exception. Change: To avoid this, we will assign readBufferSize to footerReadBufferSize when footerReadBufferSize is larger than readBufferSize. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Comment Edited] (HADOOP-18883) Expect-100 JDK bug resolution: prevent multiple server calls
[ https://issues.apache.org/jira/browse/HADOOP-18883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783216#comment-17783216 ] Pranav Saxena edited comment on HADOOP-18883 at 11/6/23 1:16 PM: - Hi [~ste...@apache.org]. Requesting you to kindly review the PR please. This is PR to prevent day-0 JDK bug around expect-100 in abfs. Would be really awesome to get your feedback on this. Thank you so much. was (Author: pranavsaxena): Hi [~ste...@apache.org] Requesting you to kindly review the PR please. This is PR to prevent day-0 JDK bug around expect-100 in abfs. Would be really awesome to get your feedback on this. Thank you so much. > Expect-100 JDK bug resolution: prevent multiple server calls > > > Key: HADOOP-18883 > URL: https://issues.apache.org/jira/browse/HADOOP-18883 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > This is inline to JDK bug: [https://bugs.openjdk.org/browse/JDK-8314978]. > > With the current implementation of HttpURLConnection if server rejects the > “Expect 100-continue” then there will be ‘java.net.ProtocolException’ will be > thrown from 'expect100Continue()' method. > After the exception thrown, If we call any other method on the same instance > (ex getHeaderField(), or getHeaderFields()). They will internally call > getOuputStream() which invokes writeRequests(), which make the actual server > call. > In the AbfsHttpOperation, after sendRequest() we call processResponse() > method from AbfsRestOperation. Even if the conn.getOutputStream() fails due > to expect-100 error, we consume the exception and let the code go ahead. So, > we can have getHeaderField() / getHeaderFields() / getHeaderFieldLong() which > will be triggered after getOutputStream is failed. These invocation will lead > to server calls. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18883) Expect-100 JDK bug resolution: prevent multiple server calls
[ https://issues.apache.org/jira/browse/HADOOP-18883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17783216#comment-17783216 ] Pranav Saxena commented on HADOOP-18883: Hi [~ste...@apache.org] Requesting you to kindly review the PR please. This is PR to prevent day-0 JDK bug around expect-100 in abfs. Would be really awesome to get your feedback on this. Thank you so much. > Expect-100 JDK bug resolution: prevent multiple server calls > > > Key: HADOOP-18883 > URL: https://issues.apache.org/jira/browse/HADOOP-18883 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > This is inline to JDK bug: [https://bugs.openjdk.org/browse/JDK-8314978]. > > With the current implementation of HttpURLConnection if server rejects the > “Expect 100-continue” then there will be ‘java.net.ProtocolException’ will be > thrown from 'expect100Continue()' method. > After the exception thrown, If we call any other method on the same instance > (ex getHeaderField(), or getHeaderFields()). They will internally call > getOuputStream() which invokes writeRequests(), which make the actual server > call. > In the AbfsHttpOperation, after sendRequest() we call processResponse() > method from AbfsRestOperation. Even if the conn.getOutputStream() fails due > to expect-100 error, we consume the exception and let the code go ahead. So, > we can have getHeaderField() / getHeaderFields() / getHeaderFieldLong() which > will be triggered after getOutputStream is failed. These invocation will lead > to server calls. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18960) ABFS contract-tests with Hadoop-Commons failing
[ https://issues.apache.org/jira/browse/HADOOP-18960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena updated HADOOP-18960: --- Fix Version/s: 3.4.0 (was: 3.3.6) > ABFS contract-tests with Hadoop-Commons failing > --- > > Key: HADOOP-18960 > URL: https://issues.apache.org/jira/browse/HADOOP-18960 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Pranav Saxena >Priority: Minor > Fix For: 3.4.0 > > > In the merged pr [HADOOP-18869: [ABFS] Fixing Behavior of a File System APIs > on root path by anujmodi2021 · Pull Request #6003 · apache/hadoop > (github.com)|https://github.com/apache/hadoop/pull/6003], a config was > switched-on: `fs.contract.test.root-tests-enabled`. This enables the root > manipulation tests for the filesystem contract. > Now, the execution of contract-tests in abfs works as per executionId > integration-test-abfs-parallel-classes of the pom. The tests would work in > different jvms, and at a given instance multiple such jvms could be there, > depending on ${testsThreadCount}. The problem is that all the test jvms for > contract-test use the same container for test runs which is defined by > `fs.contract.test.fs.abfs`. Due to this, one jvm root-contract-runs can > influence other jvm's root-contract-runs. This leads to CI failures for > hadoop-azure package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18960) ABFS contract-tests with Hadoop-Commons intermittently failing
[ https://issues.apache.org/jira/browse/HADOOP-18960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena updated HADOOP-18960: --- Summary: ABFS contract-tests with Hadoop-Commons intermittently failing (was: ABFS contract-tests with Hadoop-Commons failing) > ABFS contract-tests with Hadoop-Commons intermittently failing > -- > > Key: HADOOP-18960 > URL: https://issues.apache.org/jira/browse/HADOOP-18960 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Pranav Saxena >Priority: Minor > Fix For: 3.4.0 > > > In the merged pr [HADOOP-18869: [ABFS] Fixing Behavior of a File System APIs > on root path by anujmodi2021 · Pull Request #6003 · apache/hadoop > (github.com)|https://github.com/apache/hadoop/pull/6003], a config was > switched-on: `fs.contract.test.root-tests-enabled`. This enables the root > manipulation tests for the filesystem contract. > Now, the execution of contract-tests in abfs works as per executionId > integration-test-abfs-parallel-classes of the pom. The tests would work in > different jvms, and at a given instance multiple such jvms could be there, > depending on ${testsThreadCount}. The problem is that all the test jvms for > contract-test use the same container for test runs which is defined by > `fs.contract.test.fs.abfs`. Due to this, one jvm root-contract-runs can > influence other jvm's root-contract-runs. This leads to CI failures for > hadoop-azure package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18960) ABFS contract-tests with Hadoop-Commons failing
Pranav Saxena created HADOOP-18960: -- Summary: ABFS contract-tests with Hadoop-Commons failing Key: HADOOP-18960 URL: https://issues.apache.org/jira/browse/HADOOP-18960 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Reporter: Pranav Saxena Fix For: 3.3.6 In the merged pr [HADOOP-18869: [ABFS] Fixing Behavior of a File System APIs on root path by anujmodi2021 · Pull Request #6003 · apache/hadoop (github.com)|https://github.com/apache/hadoop/pull/6003], a config was switched-on: `fs.contract.test.root-tests-enabled`. This enables the root manipulation tests for the filesystem contract. Now, the execution of contract-tests in abfs works as per executionId integration-test-abfs-parallel-classes of the pom. The tests would work in different jvms, and at a given instance multiple such jvms could be there, depending on ${testsThreadCount}. The problem is that all the test jvms for contract-test use the same container for test runs which is defined by `fs.contract.test.fs.abfs`. Due to this, one jvm root-contract-runs can influence other jvm's root-contract-runs. This leads to CI failures for hadoop-azure package. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18883) Expect-100 JDK bug resolution: prevent multiple server calls
[ https://issues.apache.org/jira/browse/HADOOP-18883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17778522#comment-17778522 ] Pranav Saxena commented on HADOOP-18883: Hi [~ste...@apache.org] [~mehakmeetSingh] . Requesting you to kindly review the PR please. Thank you so much. > Expect-100 JDK bug resolution: prevent multiple server calls > > > Key: HADOOP-18883 > URL: https://issues.apache.org/jira/browse/HADOOP-18883 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Labels: pull-request-available > Fix For: 3.4.0 > > > This is inline to JDK bug: [https://bugs.openjdk.org/browse/JDK-8314978]. > > With the current implementation of HttpURLConnection if server rejects the > “Expect 100-continue” then there will be ‘java.net.ProtocolException’ will be > thrown from 'expect100Continue()' method. > After the exception thrown, If we call any other method on the same instance > (ex getHeaderField(), or getHeaderFields()). They will internally call > getOuputStream() which invokes writeRequests(), which make the actual server > call. > In the AbfsHttpOperation, after sendRequest() we call processResponse() > method from AbfsRestOperation. Even if the conn.getOutputStream() fails due > to expect-100 error, we consume the exception and let the code go ahead. So, > we can have getHeaderField() / getHeaderFields() / getHeaderFieldLong() which > will be triggered after getOutputStream is failed. These invocation will lead > to server calls. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18883) Expect-100 JDK bug resolution: prevent multiple server calls
[ https://issues.apache.org/jira/browse/HADOOP-18883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17764986#comment-17764986 ] Pranav Saxena commented on HADOOP-18883: Hi [~ste...@apache.org], This Jira is not related to HADOOP-18865. This Jira is a resolution on ABFS side for the JDK bug. This JDK bug has always been there. Since, it is discovered now, we want to have a resolution on our side. Thank you. > Expect-100 JDK bug resolution: prevent multiple server calls > > > Key: HADOOP-18883 > URL: https://issues.apache.org/jira/browse/HADOOP-18883 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Fix For: 3.4.0 > > > This is inline to JDK bug: [https://bugs.openjdk.org/browse/JDK-8314978]. > > With the current implementation of HttpURLConnection if server rejects the > “Expect 100-continue” then there will be ‘java.net.ProtocolException’ will be > thrown from 'expect100Continue()' method. > After the exception thrown, If we call any other method on the same instance > (ex getHeaderField(), or getHeaderFields()). They will internally call > getOuputStream() which invokes writeRequests(), which make the actual server > call. > In the AbfsHttpOperation, after sendRequest() we call processResponse() > method from AbfsRestOperation. Even if the conn.getOutputStream() fails due > to expect-100 error, we consume the exception and let the code go ahead. So, > we can have getHeaderField() / getHeaderFields() / getHeaderFieldLong() which > will be triggered after getOutputStream is failed. These invocation will lead > to server calls. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18883) Expect-100 JDK bug resolution: prevent multiple server calls
Pranav Saxena created HADOOP-18883: -- Summary: Expect-100 JDK bug resolution: prevent multiple server calls Key: HADOOP-18883 URL: https://issues.apache.org/jira/browse/HADOOP-18883 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Reporter: Pranav Saxena Assignee: Pranav Saxena Fix For: 3.4.0 This is inline to JDK bug: [https://bugs.openjdk.org/browse/JDK-8314978]. With the current implementation of HttpURLConnection if server rejects the “Expect 100-continue” then there will be ‘java.net.ProtocolException’ will be thrown from 'expect100Continue()' method. After the exception thrown, If we call any other method on the same instance (ex getHeaderField(), or getHeaderFields()). They will internally call getOuputStream() which invokes writeRequests(), which make the actual server call. In the AbfsHttpOperation, after sendRequest() we call processResponse() method from AbfsRestOperation. Even if the conn.getOutputStream() fails due to expect-100 error, we consume the exception and let the code go ahead. So, we can have getHeaderField() / getHeaderFields() / getHeaderFieldLong() which will be triggered after getOutputStream is failed. These invocation will lead to server calls. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] (HADOOP-18873) ABFS: AbfsOutputStream doesnt close DataBlocks object.
[ https://issues.apache.org/jira/browse/HADOOP-18873 ] Pranav Saxena deleted comment on HADOOP-18873: was (Author: pranavsaxena): pr: [HADOOP-18873. ABFS: AbfsOutputStream doesnt close DataBlocks object. by saxenapranav · Pull Request #6010 · apache/hadoop (github.com)|https://github.com/apache/hadoop/pull/6010] > ABFS: AbfsOutputStream doesnt close DataBlocks object. > -- > > Key: HADOOP-18873 > URL: https://issues.apache.org/jira/browse/HADOOP-18873 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.3.4 >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Fix For: 3.3.4 > > > AbfsOutputStream doesnt close the dataBlock object created for the upload. > What is the implication of not doing that: > DataBlocks has three implementations: > # ByteArrayBlock > ## This creates an object of DataBlockByteArrayOutputStream (child of > ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading > the array. > ## This gets GCed. > # ByteBufferBlock: > ## There is a defined *DirectBufferPool* from which it tries to request the > directBuffer. > ## If nothing in the pool, a new directBuffer is created. > ## the `close` method on the this object has the responsiblity of returning > back the buffer to pool so it can be reused. > ## Since we are not calling the `close`: > ### The pool is rendered of less use, since each request creates a new > directBuffer from memory. > ### All the object can be GCed and the direct-memory allocated may be > returned on the GC. What if the process crashes, the memory never goes back > and cause memory issue on the machine. > # DiskBlock: > ## This creates a file on disk on which the data-to-upload is written. This > file gets deleted in startUpload().close(). > > startUpload() gives an object of BlockUploadData which gives method of > `toByteArray()` which is used in abfsOutputStream to get the byteArray in the > dataBlock. > > Method which uses the DataBlock object: > https://github.com/apache/hadoop/blob/fac7d26c5d7f791565cc3ab45d079e2cca725f95/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java#L298 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18873) ABFS: AbfsOutputStream doesnt close DataBlocks object.
[ https://issues.apache.org/jira/browse/HADOOP-18873?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17761263#comment-17761263 ] Pranav Saxena commented on HADOOP-18873: pr: [HADOOP-18873. ABFS: AbfsOutputStream doesnt close DataBlocks object. by saxenapranav · Pull Request #6010 · apache/hadoop (github.com)|https://github.com/apache/hadoop/pull/6010] > ABFS: AbfsOutputStream doesnt close DataBlocks object. > -- > > Key: HADOOP-18873 > URL: https://issues.apache.org/jira/browse/HADOOP-18873 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.3.4 >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Fix For: 3.3.4 > > > AbfsOutputStream doesnt close the dataBlock object created for the upload. > What is the implication of not doing that: > DataBlocks has three implementations: > # ByteArrayBlock > ## This creates an object of DataBlockByteArrayOutputStream (child of > ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading > the array. > ## This gets GCed. > # ByteBufferBlock: > ## There is a defined *DirectBufferPool* from which it tries to request the > directBuffer. > ## If nothing in the pool, a new directBuffer is created. > ## the `close` method on the this object has the responsiblity of returning > back the buffer to pool so it can be reused. > ## Since we are not calling the `close`: > ### The pool is rendered of less use, since each request creates a new > directBuffer from memory. > ### All the object can be GCed and the direct-memory allocated may be > returned on the GC. What if the process crashes, the memory never goes back > and cause memory issue on the machine. > # DiskBlock: > ## This creates a file on disk on which the data-to-upload is written. This > file gets deleted in startUpload().close(). > > startUpload() gives an object of BlockUploadData which gives method of > `toByteArray()` which is used in abfsOutputStream to get the byteArray in the > dataBlock. > > Method which uses the DataBlock object: > https://github.com/apache/hadoop/blob/fac7d26c5d7f791565cc3ab45d079e2cca725f95/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java#L298 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18873) ABFS: AbfsOutputStream doesnt close DataBlocks object.
[ https://issues.apache.org/jira/browse/HADOOP-18873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena updated HADOOP-18873: --- Description: AbfsOutputStream doesnt close the dataBlock object created for the upload. What is the implication of not doing that: DataBlocks has three implementations: # ByteArrayBlock ## This creates an object of DataBlockByteArrayOutputStream (child of ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading the array. ## This gets GCed. # ByteBufferBlock: ## There is a defined *DirectBufferPool* from which it tries to request the directBuffer. ## If nothing in the pool, a new directBuffer is created. ## the `close` method on the this object has the responsiblity of returning back the buffer to pool so it can be reused. ## Since we are not calling the `close`: ### The pool is rendered of less use, since each request creates a new directBuffer from memory. ### All the object can be GCed and the direct-memory allocated may be returned on the GC. What if the process crashes, the memory never goes back and cause memory issue on the machine. # DiskBlock: ## This creates a file on disk on which the data-to-upload is written. This file gets deleted in startUpload().close(). startUpload() gives an object of BlockUploadData which gives method of `toByteArray()` which is used in abfsOutputStream to get the byteArray in the dataBlock. Method which uses the DataBlock object: https://github.com/apache/hadoop/blob/fac7d26c5d7f791565cc3ab45d079e2cca725f95/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/services/AbfsOutputStream.java#L298 was: AbfsOutputStream doesnt close the dataBlock object created for the upload. What is the implication of not doing that: DataBlocks has three implementations: # ByteArrayBlock ## This creates an object of DataBlockByteArrayOutputStream (child of ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading the array. ## This gets GCed. # ByteBufferBlock: ## There is a defined *DirectBufferPool* from which it tries to request the directBuffer. ## If nothing in the pool, a new directBuffer is created. ## the `close` method on the this object has the responsiblity of returning back the buffer to pool so it can be reused. ## Since we are not calling the `close`: ### The pool is rendered of less use, since each request creates a new directBuffer from memory. ### All the object can be GCed and the direct-memory allocated may be returned on the GC. What if the process crashes, the memory never goes back and cause memory issue on the machine. # DiskBlock: ## This creates a file on disk on which the data-to-upload is written. This file gets deleted in startUpload().close(). startUpload() gives an object of BlockUploadData which gives method of `toByteArray()` which is used in abfsOutputStream to get the byteArray in the dataBlock. > ABFS: AbfsOutputStream doesnt close DataBlocks object. > -- > > Key: HADOOP-18873 > URL: https://issues.apache.org/jira/browse/HADOOP-18873 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.3.4 >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Fix For: 3.3.4 > > > AbfsOutputStream doesnt close the dataBlock object created for the upload. > What is the implication of not doing that: > DataBlocks has three implementations: > # ByteArrayBlock > ## This creates an object of DataBlockByteArrayOutputStream (child of > ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading > the array. > ## This gets GCed. > # ByteBufferBlock: > ## There is a defined *DirectBufferPool* from which it tries to request the > directBuffer. > ## If nothing in the pool, a new directBuffer is created. > ## the `close` method on the this object has the responsiblity of returning > back the buffer to pool so it can be reused. > ## Since we are not calling the `close`: > ### The pool is rendered of less use, since each request creates a new > directBuffer from memory. > ### All the object can be GCed and the direct-memory allocated may be > returned on the GC. What if the process crashes, the memory never goes back > and cause memory issue on the machine. > # DiskBlock: > ## This creates a file on disk on which the data-to-upload is written. This > file gets deleted in startUpload().close(). > > startUpload() gives an object of BlockUploadData which gives method of > `toByteArray()` which is used in abfsOutputStream to get the byteArray in the > dataBlock. > > Method which uses the DataBlock object: >
[jira] [Updated] (HADOOP-18873) ABFS: AbfsOutputStream doesnt close DataBlocks object.
[ https://issues.apache.org/jira/browse/HADOOP-18873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena updated HADOOP-18873: --- Description: AbfsOutputStream doesnt close the dataBlock object created for the upload. What is the implication of not doing that: DataBlocks has three implementations: # ByteArrayBlock ## This creates an object of DataBlockByteArrayOutputStream (child of ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading the array. ## This gets GCed. # ByteBufferBlock: ## There is a defined *DirectBufferPool* from which it tries to request the directBuffer. ## If nothing in the pool, a new directBuffer is created. ## the `close` method on the this object has the responsiblity of returning back the buffer to pool so it can be reused. ## Since we are not calling the `close`: ### The pool is rendered of less use, since each request creates a new directBuffer from memory. ### All the object can be GCed and the direct-memory allocated may be returned on the GC. What if the process crashes, the memory never goes back and cause memory issue on the machine. # DiskBlock: ## This creates a file on disk on which the data-to-upload is written. This file gets deleted in startUpload().close(). startUpload() gives an object of BlockUploadData which gives method of `toByteArray()` which is used in abfsOutputStream to get the byteArray in the dataBlock. was: AbfsOutputStream doesnt close the dataBlock object created for the upload. What is the implication of not doing that: DataBlocks has three implementations: # ByteArrayBlock ## This creates an object of DataBlockByteArrayOutputStream (child of ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading the array. ## This gets GCed. # ByteBufferBlock: ## There is a defined *DirectBufferPool* from which it tries to request the directBuffer. ## If nothing in the pool, a new directBuffer is created. ## the `close` method on the this object has the responsiblity of returning back the buffer to pool so it can be reused. ## Since we are not calling the `close`: ### The pool is rendered of less use, since each request creates a new directBuffer from memory. ### All the object can be GCed and the direct-memory allocated may be returned on the GC. What if the process crashes, the memory never goes back and cause memory issue on the machine. # DiskBlock: ## This creates a file on disk on which the data-to-upload is written. This file gets deleted in startUpload().close(). > ABFS: AbfsOutputStream doesnt close DataBlocks object. > -- > > Key: HADOOP-18873 > URL: https://issues.apache.org/jira/browse/HADOOP-18873 > Project: Hadoop Common > Issue Type: Sub-task >Affects Versions: 3.3.4 >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Fix For: 3.3.4 > > > AbfsOutputStream doesnt close the dataBlock object created for the upload. > What is the implication of not doing that: > DataBlocks has three implementations: > # ByteArrayBlock > ## This creates an object of DataBlockByteArrayOutputStream (child of > ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading > the array. > ## This gets GCed. > # ByteBufferBlock: > ## There is a defined *DirectBufferPool* from which it tries to request the > directBuffer. > ## If nothing in the pool, a new directBuffer is created. > ## the `close` method on the this object has the responsiblity of returning > back the buffer to pool so it can be reused. > ## Since we are not calling the `close`: > ### The pool is rendered of less use, since each request creates a new > directBuffer from memory. > ### All the object can be GCed and the direct-memory allocated may be > returned on the GC. What if the process crashes, the memory never goes back > and cause memory issue on the machine. > # DiskBlock: > ## This creates a file on disk on which the data-to-upload is written. This > file gets deleted in startUpload().close(). > > startUpload() gives an object of BlockUploadData which gives method of > `toByteArray()` which is used in abfsOutputStream to get the byteArray in the > dataBlock. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18873) ABFS: AbfsOutputStream doesnt close DataBlocks object.
Pranav Saxena created HADOOP-18873: -- Summary: ABFS: AbfsOutputStream doesnt close DataBlocks object. Key: HADOOP-18873 URL: https://issues.apache.org/jira/browse/HADOOP-18873 Project: Hadoop Common Issue Type: Sub-task Affects Versions: 3.3.4 Reporter: Pranav Saxena Assignee: Pranav Saxena Fix For: 3.3.4 AbfsOutputStream doesnt close the dataBlock object created for the upload. What is the implication of not doing that: DataBlocks has three implementations: # ByteArrayBlock ## This creates an object of DataBlockByteArrayOutputStream (child of ByteArrayOutputStream: wrapper arround byte-arrray for populating, reading the array. ## This gets GCed. # ByteBufferBlock: ## There is a defined *DirectBufferPool* from which it tries to request the directBuffer. ## If nothing in the pool, a new directBuffer is created. ## the `close` method on the this object has the responsiblity of returning back the buffer to pool so it can be reused. ## Since we are not calling the `close`: ### The pool is rendered of less use, since each request creates a new directBuffer from memory. ### All the object can be GCed and the direct-memory allocated may be returned on the GC. What if the process crashes, the memory never goes back and cause memory issue on the machine. # DiskBlock: ## This creates a file on disk on which the data-to-upload is written. This file gets deleted in startUpload().close(). -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18580) Special characters handling in Azure file system is different from others
[ https://issues.apache.org/jira/browse/HADOOP-18580?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17727787#comment-17727787 ] Pranav Saxena commented on HADOOP-18580: AbfsClient#createRequestUrl uses URLEncoder.encode(val, UTF_8). {code:java} String val = ".../part=a%25percent/test.parquet"; System.out.println(URLEncoder.encode(val, UTF_8)); {code} gives: `...%2Fpart%3Da%2525percent%2Ftest.parquet` we can do something like: {code:java} URLEncoder.encode(val, UTF_8).replace("%2525", "%25") {code} > Special characters handling in Azure file system is different from others > - > > Key: HADOOP-18580 > URL: https://issues.apache.org/jira/browse/HADOOP-18580 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Affects Versions: 3.2.0 >Reporter: Yuya Ebihara >Priority: Minor > > Special characters handling in > [AzureBlobFileSystem|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-azure/src/main/java/org/apache/hadoop/fs/azurebfs/AzureBlobFileSystem.java] > looks different from other file systems. e.g. GoogleHadoopFileSystem. > For instance, FileSystem.open with ".../part=a%25percent/test.parquet" works > fine in other file systems, but ABFS requires > ".../part=a%percent/test.parquet" (%25 → %) because the path is URL encoded > by AbfsClient#createRequestUrl internally. Can we change the behavior? Or can > I request to add javadoc to explain the behavior if this is the expected > behavior? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18606) Add reason in in x-ms-client-request-id on a retry API call.
[ https://issues.apache.org/jira/browse/HADOOP-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17697784#comment-17697784 ] Pranav Saxena commented on HADOOP-18606: PR for backport to branch-3.3: https://github.com/apache/hadoop/pull/5461 > Add reason in in x-ms-client-request-id on a retry API call. > > > Key: HADOOP-18606 > URL: https://issues.apache.org/jira/browse/HADOOP-18606 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > In the header, x-ms-client-request-id contains informaiton on what retry this > particular API call is: for ex: > :eb06d8f6-5693-461b-b63c-5858fa7655e6:29cb0d19-2b68-4409-bc35-cb7160b90dd8:::CF:1. > We want to add the reason for the retry in the header_value:Now the same > header would include retry reason in case its not the 0th iteration of the > API operation. It would be like > :eb06d8f6-5693-461b-b63c-5858fa7655e6:29cb0d19-2b68-4409-bc35-cb7160b90dd8:::CF:1_RT. > This corresponds that its retry number 1. The 0th iteration was failed due > to read timeout. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18647) x-ms-client-request-id to have some way that identifies retry of an API.
Pranav Saxena created HADOOP-18647: -- Summary: x-ms-client-request-id to have some way that identifies retry of an API. Key: HADOOP-18647 URL: https://issues.apache.org/jira/browse/HADOOP-18647 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Reporter: Pranav Saxena Assignee: Pranav Saxena Fix For: 3.4.0 In case primaryRequestId in x-ms-client-request-id is empty-string, the retry's primaryRequestId has to contain last part of clientRequestId UUID. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18547) Check if config value is not empty string in AbfsConfiguration.getMandatoryPasswordString()
[ https://issues.apache.org/jira/browse/HADOOP-18547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena updated HADOOP-18547: --- Description: The method `getMandatoryPasswordString` is called in `AbfsConfiguration.getTokenProvider()' to check if following configs are non-null (diff keys applicable for different implementation of AccessTokenProvider): 1. fs.azure.account.oauth2.client.endpoint: in ClientCredsTokenProvider 2. fs.azure.account.oauth2.client.id: in ClientCredsTokenProvider, MsiTokenProvider, RefreshTokenBasedTokenProvider 3. fs.azure.account.oauth2.client.secret: in ClientCredsTokenProvider 4. fs.azure.account.oauth2.client.endpoint: in UserPasswordTokenProvider 5. fs.azure.account.oauth2.user.name: in UserPasswordTokenProvider 6. fs.azure.account.oauth2.user.password: in UserPasswordTokenProvider 7. fs.azure.account.oauth2.msi.tenant: in MsiTokenProvider 8. fs.azure.account.oauth2.refresh.token: in RefreshTokenBasedTokenProvider Right now, this method checks if its non-null and not non-empty. This task needs to add check on non-empty config values. > Check if config value is not empty string in > AbfsConfiguration.getMandatoryPasswordString() > --- > > Key: HADOOP-18547 > URL: https://issues.apache.org/jira/browse/HADOOP-18547 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Affects Versions: 3.3.4 >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Minor > Labels: pull-request-available > > The method `getMandatoryPasswordString` is called in > `AbfsConfiguration.getTokenProvider()' to check if following configs are > non-null (diff keys applicable for different implementation of > AccessTokenProvider): > 1. fs.azure.account.oauth2.client.endpoint: in ClientCredsTokenProvider > 2. fs.azure.account.oauth2.client.id: in ClientCredsTokenProvider, > MsiTokenProvider, RefreshTokenBasedTokenProvider > 3. fs.azure.account.oauth2.client.secret: in ClientCredsTokenProvider > 4. fs.azure.account.oauth2.client.endpoint: in UserPasswordTokenProvider > 5. fs.azure.account.oauth2.user.name: in UserPasswordTokenProvider > 6. fs.azure.account.oauth2.user.password: in UserPasswordTokenProvider > 7. fs.azure.account.oauth2.msi.tenant: in MsiTokenProvider > 8. fs.azure.account.oauth2.refresh.token: in RefreshTokenBasedTokenProvider > Right now, this method checks if its non-null and not non-empty. This task > needs to add check on non-empty config values. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18606) Add reason in in x-ms-client-request-id on a retry API call.
[ https://issues.apache.org/jira/browse/HADOOP-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena updated HADOOP-18606: --- Fix Version/s: 3.4.0 > Add reason in in x-ms-client-request-id on a retry API call. > > > Key: HADOOP-18606 > URL: https://issues.apache.org/jira/browse/HADOOP-18606 > Project: Hadoop Common > Issue Type: Sub-task >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Minor > Fix For: 3.4.0 > > > In the header, x-ms-client-request-id contains informaiton on what retry this > particular API call is: for ex: > :eb06d8f6-5693-461b-b63c-5858fa7655e6:29cb0d19-2b68-4409-bc35-cb7160b90dd8:::CF:1. > We want to add the reason for the retry in the header_value:Now the same > header would include retry reason in case its not the 0th iteration of the > API operation. It would be like > :eb06d8f6-5693-461b-b63c-5858fa7655e6:29cb0d19-2b68-4409-bc35-cb7160b90dd8:::CF:1_RT. > This corresponds that its retry number 1. The 0th iteration was failed due > to read timeout. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18606) Add reason in in x-ms-client-request-id on a retry API call.
[ https://issues.apache.org/jira/browse/HADOOP-18606?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena updated HADOOP-18606: --- Component/s: fs/azure > Add reason in in x-ms-client-request-id on a retry API call. > > > Key: HADOOP-18606 > URL: https://issues.apache.org/jira/browse/HADOOP-18606 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Minor > Fix For: 3.4.0 > > > In the header, x-ms-client-request-id contains informaiton on what retry this > particular API call is: for ex: > :eb06d8f6-5693-461b-b63c-5858fa7655e6:29cb0d19-2b68-4409-bc35-cb7160b90dd8:::CF:1. > We want to add the reason for the retry in the header_value:Now the same > header would include retry reason in case its not the 0th iteration of the > API operation. It would be like > :eb06d8f6-5693-461b-b63c-5858fa7655e6:29cb0d19-2b68-4409-bc35-cb7160b90dd8:::CF:1_RT. > This corresponds that its retry number 1. The 0th iteration was failed due > to read timeout. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18606) Add reason in in x-ms-client-request-id on a retry API call.
Pranav Saxena created HADOOP-18606: -- Summary: Add reason in in x-ms-client-request-id on a retry API call. Key: HADOOP-18606 URL: https://issues.apache.org/jira/browse/HADOOP-18606 Project: Hadoop Common Issue Type: Sub-task Reporter: Pranav Saxena Assignee: Pranav Saxena In the header, x-ms-client-request-id contains informaiton on what retry this particular API call is: for ex: :eb06d8f6-5693-461b-b63c-5858fa7655e6:29cb0d19-2b68-4409-bc35-cb7160b90dd8:::CF:1. We want to add the reason for the retry in the header_value:Now the same header would include retry reason in case its not the 0th iteration of the API operation. It would be like :eb06d8f6-5693-461b-b63c-5858fa7655e6:29cb0d19-2b68-4409-bc35-cb7160b90dd8:::CF:1_RT. This corresponds that its retry number 1. The 0th iteration was failed due to read timeout. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17912) ABFS: Support for Encryption Context
[ https://issues.apache.org/jira/browse/HADOOP-17912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17679670#comment-17679670 ] Pranav Saxena commented on HADOOP-17912: [~mehakmeet] [~mthakur], requesting you to kindly review the PR. Thanks. > ABFS: Support for Encryption Context > > > Key: HADOOP-17912 > URL: https://issues.apache.org/jira/browse/HADOOP-17912 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.1 >Reporter: Sumangala Patki >Assignee: Pranav Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Support for customer-provided encryption keys at the file level, superceding > the global (account-level) key use in HADOOP-17536. > ABFS driver will support an "EncryptionContext" plugin for retrieving > encryption information, the implementation for which should be provided by > the client. The keys/context retrieved will be sent via request headers to > the server, which will store the encryption context. Subsequent REST calls to > server that access data/user metadata of the file will require fetching the > encryption context through a GetFileProperties call and retrieving the key > from the custom provider, before sending the request. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17912) ABFS: Support for Encryption Context
[ https://issues.apache.org/jira/browse/HADOOP-17912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17678488#comment-17678488 ] Pranav Saxena commented on HADOOP-17912: Hi [~ste...@apache.org], requesting you to kindly review the PR please. Regards. > ABFS: Support for Encryption Context > > > Key: HADOOP-17912 > URL: https://issues.apache.org/jira/browse/HADOOP-17912 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.1 >Reporter: Sumangala Patki >Assignee: Pranav Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Support for customer-provided encryption keys at the file level, superceding > the global (account-level) key use in HADOOP-17536. > ABFS driver will support an "EncryptionContext" plugin for retrieving > encryption information, the implementation for which should be provided by > the client. The keys/context retrieved will be sent via request headers to > the server, which will store the encryption context. Subsequent REST calls to > server that access data/user metadata of the file will require fetching the > encryption context through a GetFileProperties call and retrieving the key > from the custom provider, before sending the request. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-18546) disable purging list of in progress reads in abfs stream closed
[ https://issues.apache.org/jira/browse/HADOOP-18546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena reassigned HADOOP-18546: -- Assignee: Pranav Saxena (was: Steve Loughran) > disable purging list of in progress reads in abfs stream closed > --- > > Key: HADOOP-18546 > URL: https://issues.apache.org/jira/browse/HADOOP-18546 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.4 >Reporter: Steve Loughran >Assignee: Pranav Saxena >Priority: Major > Labels: pull-request-available > > turn off the prune of in progress reads in > ReadBufferManager::purgeBuffersForStream > this will ensure active prefetches for a closed stream complete. they wiill > then get to the completed list and hang around until evicted by timeout, but > at least prefetching will be safe. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18547) Check if config value is not empty string in AbfsConfiguration.getMandatoryPasswordString()
Pranav Saxena created HADOOP-18547: -- Summary: Check if config value is not empty string in AbfsConfiguration.getMandatoryPasswordString() Key: HADOOP-18547 URL: https://issues.apache.org/jira/browse/HADOOP-18547 Project: Hadoop Common Issue Type: Bug Components: fs/azure Affects Versions: 3.3.4 Reporter: Pranav Saxena Assignee: Pranav Saxena -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18501) [ABFS]: Partial Read should add to throttling metric
[ https://issues.apache.org/jira/browse/HADOOP-18501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena updated HADOOP-18501: --- Description: Error Description: For partial read (due to account backend throttling), the ABFS driver retry but doesn't add up in the throttling metrics. In case of partial read with connection-reset exception, ABFS driver retry for the full request and doesn't add up in throttling metrics. Mitigation: In case of partial read, Abfs Driver should retry for the remaining bytes and it should be added in throttling metrics. was: Error Description: For partial read (due to account backend throttling), the ABFS driver doesn't retry and doesn't add up in the throttling metrics. Mitigation: Abfs Driver should retry for the remaining bytes. Also, it should be added in throttling metrics. > [ABFS]: Partial Read should add to throttling metric > > > Key: HADOOP-18501 > URL: https://issues.apache.org/jira/browse/HADOOP-18501 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Affects Versions: 3.3.4 >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Minor > > Error Description: > For partial read (due to account backend throttling), the ABFS driver retry > but doesn't add up in the throttling metrics. > In case of partial read with connection-reset exception, ABFS driver retry > for the full request and doesn't add up in throttling metrics. > Mitigation: > In case of partial read, Abfs Driver should retry for the remaining bytes and > it should be added in throttling metrics. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18501) [ABFS]: Partial Read should add to throttling metric
[ https://issues.apache.org/jira/browse/HADOOP-18501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena updated HADOOP-18501: --- Component/s: fs/azure > [ABFS]: Partial Read should add to throttling metric > > > Key: HADOOP-18501 > URL: https://issues.apache.org/jira/browse/HADOOP-18501 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Minor > > Error Description: > For partial read (due to account backend throttling), the ABFS driver doesn't > retry and doesn't add up in the throttling metrics. > Mitigation: > Abfs Driver should retry for the remaining bytes. Also, it should be added in > throttling metrics. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18501) [ABFS]: Partial Read should add to throttling metric
[ https://issues.apache.org/jira/browse/HADOOP-18501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena updated HADOOP-18501: --- Affects Version/s: 3.3.4 > [ABFS]: Partial Read should add to throttling metric > > > Key: HADOOP-18501 > URL: https://issues.apache.org/jira/browse/HADOOP-18501 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure >Affects Versions: 3.3.4 >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Minor > > Error Description: > For partial read (due to account backend throttling), the ABFS driver doesn't > retry and doesn't add up in the throttling metrics. > Mitigation: > Abfs Driver should retry for the remaining bytes. Also, it should be added in > throttling metrics. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-18501) [ABFS]: Partial Read should add to throttling metric
[ https://issues.apache.org/jira/browse/HADOOP-18501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620950#comment-17620950 ] Pranav Saxena commented on HADOOP-18501: Branch WIP: https://github.com/pranavsaxena-microsoft/hadoop/tree/partialReadThrottle > [ABFS]: Partial Read should add to throttling metric > > > Key: HADOOP-18501 > URL: https://issues.apache.org/jira/browse/HADOOP-18501 > Project: Hadoop Common > Issue Type: Bug >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Minor > > Error Description: > For partial read (due to account backend throttling), the ABFS driver doesn't > retry and doesn't add up in the throttling metrics. > Mitigation: > Abfs Driver should retry for the remaining bytes. Also, it should be added in > throttling metrics. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Updated] (HADOOP-18501) [ABFS]: Partial Read should add to throttling metric
[ https://issues.apache.org/jira/browse/HADOOP-18501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena updated HADOOP-18501: --- Description: Error Description: For partial read (due to account backend throttling), the ABFS driver doesn't retry and doesn't add up in the throttling metrics. Mitigation: Abfs Driver should retry for the remaining bytes. Also, it should be added in throttling metrics. was: Error Description: At present, SAS Tokens generated from the Azure Portal may or may not contain a ? as a prefix. SAS Tokens that contain the ? prefix will lead to an error in the driver due to a clash of query parameters. This leads to customers having to manually remove the ? prefix before passing the SAS Tokens. Mitigation: After receiving the SAS Token from the provider, check if any prefix ? is present or not. If present, remove it and pass the SAS Token. > [ABFS]: Partial Read should add to throttling metric > > > Key: HADOOP-18501 > URL: https://issues.apache.org/jira/browse/HADOOP-18501 > Project: Hadoop Common > Issue Type: Bug >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Minor > > Error Description: > For partial read (due to account backend throttling), the ABFS driver doesn't > retry and doesn't add up in the throttling metrics. > Mitigation: > Abfs Driver should retry for the remaining bytes. Also, it should be added in > throttling metrics. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18501) [ABFS]: Partial Read should add to throttling metric
Pranav Saxena created HADOOP-18501: -- Summary: [ABFS]: Partial Read should add to throttling metric Key: HADOOP-18501 URL: https://issues.apache.org/jira/browse/HADOOP-18501 Project: Hadoop Common Issue Type: Bug Reporter: Pranav Saxena Assignee: Sree Bhattacharyya Error Description: At present, SAS Tokens generated from the Azure Portal may or may not contain a ? as a prefix. SAS Tokens that contain the ? prefix will lead to an error in the driver due to a clash of query parameters. This leads to customers having to manually remove the ? prefix before passing the SAS Tokens. Mitigation: After receiving the SAS Token from the provider, check if any prefix ? is present or not. If present, remove it and pass the SAS Token. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-18501) [ABFS]: Partial Read should add to throttling metric
[ https://issues.apache.org/jira/browse/HADOOP-18501?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena reassigned HADOOP-18501: -- Assignee: Pranav Saxena (was: Sree Bhattacharyya) > [ABFS]: Partial Read should add to throttling metric > > > Key: HADOOP-18501 > URL: https://issues.apache.org/jira/browse/HADOOP-18501 > Project: Hadoop Common > Issue Type: Bug >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Minor > > Error Description: > At present, SAS Tokens generated from the Azure Portal may or may not contain > a ? as a prefix. SAS Tokens that contain the ? prefix will lead to an error > in the driver due to a clash of query parameters. This leads to customers > having to manually remove the ? prefix before passing the SAS Tokens. > Mitigation: > After receiving the SAS Token from the provider, check if any prefix ? is > present or not. If present, remove it and pass the SAS Token. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Commented] (HADOOP-17912) ABFS: Support for Encryption Context
[ https://issues.apache.org/jira/browse/HADOOP-17912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17613289#comment-17613289 ] Pranav Saxena commented on HADOOP-17912: [~ste...@apache.org], requesting you to kindly review the PR: https://github.com/apache/hadoop/pull/3440. Thanks. > ABFS: Support for Encryption Context > > > Key: HADOOP-17912 > URL: https://issues.apache.org/jira/browse/HADOOP-17912 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.1 >Reporter: Sumangala Patki >Assignee: Pranav Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Support for customer-provided encryption keys at the file level, superceding > the global (account-level) key use in HADOOP-17536. > ABFS driver will support an "EncryptionContext" plugin for retrieving > encryption information, the implementation for which should be provided by > the client. The keys/context retrieved will be sent via request headers to > the server, which will store the encryption context. Subsequent REST calls to > server that access data/user metadata of the file will require fetching the > encryption context through a GetFileProperties call and retrieving the key > from the custom provider, before sending the request. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-18408) [ABFS]: Ignore run of ITestAbfsRenameStageFailure for NonHNS-SharedKey configuration
[ https://issues.apache.org/jira/browse/HADOOP-18408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena reassigned HADOOP-18408: -- Assignee: Sree Bhattacharyya (was: Pranav Saxena) > [ABFS]: Ignore run of ITestAbfsRenameStageFailure for NonHNS-SharedKey > configuration > > > Key: HADOOP-18408 > URL: https://issues.apache.org/jira/browse/HADOOP-18408 > Project: Hadoop Common > Issue Type: Bug > Components: fs/azure, test >Reporter: Pranav Saxena >Assignee: Sree Bhattacharyya >Priority: Minor > Labels: pull-request-available > > ITestAbfsRenameStageFailure fails for NonHNS-SharedKey configuration. > Failure: > [ERROR] > ITestAbfsRenameStageFailure>TestRenameStageFailure.testResilienceAsExpected:126 > [resilient commit support] expected:<[tru]e> but was:<[fals]e> > RCA: > ResilientCommit looks for whether etags are preserved in rename, if not then > it throws an exception and the flag for resilientCommitByRename stays null, > leading ultimately to the test failure > Mitigation: > Since, etags are not preserved in the case of rename in nonHNS account, test > run for nonHNS account is not valid case. Hence, as part of this task, we > shall ignore this test for nonHNS configuraiton. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Created] (HADOOP-18408) [ABFS]: Ignore run of ITestAbfsRenameStageFailure for NonHNS-SharedKey configuration
Pranav Saxena created HADOOP-18408: -- Summary: [ABFS]: Ignore run of ITestAbfsRenameStageFailure for NonHNS-SharedKey configuration Key: HADOOP-18408 URL: https://issues.apache.org/jira/browse/HADOOP-18408 Project: Hadoop Common Issue Type: Bug Components: fs/azure Reporter: Pranav Saxena Assignee: Pranav Saxena ITestAbfsRenameStageFailure fails for NonHNS-SharedKey configuration. Failure: [ERROR] ITestAbfsRenameStageFailure>TestRenameStageFailure.testResilienceAsExpected:126 [resilient commit support] expected:<[tru]e> but was:<[fals]e> RCA: ResilientCommit looks for whether etags are preserved in rename, if not then it throws an exception and the flag for resilientCommitByRename stays null, leading ultimately to the test failure Mitigation: Since, etags are not preserved in the case of rename in nonHNS account, test run for nonHNS account is not valid case. Hence, as part of this task, we shall ignore this test for nonHNS configuraiton. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org
[jira] [Assigned] (HADOOP-17912) ABFS: Support for Encryption Context
[ https://issues.apache.org/jira/browse/HADOOP-17912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pranav Saxena reassigned HADOOP-17912: -- Assignee: Pranav Saxena (was: Sumangala Patki) > ABFS: Support for Encryption Context > > > Key: HADOOP-17912 > URL: https://issues.apache.org/jira/browse/HADOOP-17912 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.1 >Reporter: Sumangala Patki >Assignee: Pranav Saxena >Priority: Major > Labels: pull-request-available > Time Spent: 1h > Remaining Estimate: 0h > > Support for customer-provided encryption keys at the file level, superceding > the global (account-level) key use in HADOOP-17536. > ABFS driver will support an "EncryptionContext" plugin for retrieving > encryption information, the implementation for which should be provided by > the client. The keys/context retrieved will be sent via request headers to > the server, which will store the encryption context. Subsequent REST calls to > server that access data/user metadata of the file will require fetching the > encryption context through a GetFileProperties call and retrieving the key > from the custom provider, before sending the request. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org