[jira] [Created] (HADOOP-19245) S3ABlockOutputStream no longer sends progress events in close()
Steve Loughran created HADOOP-19245: --- Summary: S3ABlockOutputStream no longer sends progress events in close() Key: HADOOP-19245 URL: https://issues.apache.org/jira/browse/HADOOP-19245 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Steve Loughran Assignee: Steve Loughran We don't get progress events passed through from S3ABlockOutputStream to any Progress instance passed in which doesn't implement ProgressListener This is due to a signature mismatch between the changed ProgressableListener interface and the {{S3ABlockOutputStream.ProgressListener}} impl. * critical because distcp jobs will timeout on large uploads without this * trivial to fix; does need a test -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19244) Pullout arch-agnostic maven javadoc plugin configurations in hadoop-common
Cheng Pan created HADOOP-19244: -- Summary: Pullout arch-agnostic maven javadoc plugin configurations in hadoop-common Key: HADOOP-19244 URL: https://issues.apache.org/jira/browse/HADOOP-19244 Project: Hadoop Common Issue Type: Improvement Components: build, common Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19243) Upgrade Mockito version to 4.11.0
Muskan Mishra created HADOOP-19243: -- Summary: Upgrade Mockito version to 4.11.0 Key: HADOOP-19243 URL: https://issues.apache.org/jira/browse/HADOOP-19243 Project: Hadoop Common Issue Type: Task Reporter: Muskan Mishra Assignee: Muskan Mishra While Compiling test classes with JDK17, faced error related to Mockito: *Mockito cannot mock this class.* So to make it compatible with jdk17 we have to upgrade the version of mockito-core as well as mockito-inline. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19242) Add a feature to disable redirection for the OSS connector.
zhouao created HADOOP-19242: --- Summary: Add a feature to disable redirection for the OSS connector. Key: HADOOP-19242 URL: https://issues.apache.org/jira/browse/HADOOP-19242 Project: Hadoop Common Issue Type: Improvement Components: fs/oss Affects Versions: 3.3.2, 3.1.0 Reporter: zhouao For security reasons, some users of the OSS connector wish to disable the connector's HTTP redirection functionality. The OSS Java SDK have the capability to turn off HTTP redirection, but the configuration is not exposed in the {{core-site.xml}} file. This change primarily involves adding a flag to disable HTTP redirection in the {{core-site.xml}} file -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19241) NoSuchMethodError in aws sdk third party logger in hadoop aws 3.4
ashutoshraina created HADOOP-19241: -- Summary: NoSuchMethodError in aws sdk third party logger in hadoop aws 3.4 Key: HADOOP-19241 URL: https://issues.apache.org/jira/browse/HADOOP-19241 Project: Hadoop Common Issue Type: Bug Components: hadoop-thirdparty, tools Affects Versions: 3.4.0 Reporter: ashutoshraina {code:java} "localizedMessage": "java.lang.NoSuchMethodError: 'software.amazon.awssdk.thirdparty.org.slf4j.Logger software.amazon.awssdk.utils.Logger.logger()'", "message": "java.lang.NoSuchMethodError: 'software.amazon.awssdk.thirdparty.org.slf4j.Logger software.amazon.awssdk.utils.Logger.logger()'", "name": "com.google.common.util.concurrent.ExecutionError", "cause": { "commonElementCount": 1, "localizedMessage": "'software.amazon.awssdk.thirdparty.org.slf4j.Logger software.amazon.awssdk.utils.Logger.logger()'", "message": "'software.amazon.awssdk.thirdparty.org.slf4j.Logger software.amazon.awssdk.utils.Logger.logger()'", "name": "java.lang.NoSuchMethodError", "extendedStackTrace": [ { "class": "software.amazon.awssdk.transfer.s3.internal.GenericS3TransferManager", "method": "close", "file": "GenericS3TransferManager.java", "line": 393, "exact": false, "location": "bundle-2.23.19.jar", "version": "?" }, { "class": "org.apache.hadoop.fs.s3a.S3AUtils", "method": "closeAutocloseables", "file": "S3AUtils.java", "line": 1553, "exact": false, "location": "hadoop-aws-3.4.0.jar", "version": "?" }, { "class": "org.apache.hadoop.fs.s3a.S3AFileSystem", "method": "stopAllServices", "file": "S3AFileSystem.java", "line": 4358, "exact": false, "location": "hadoop-aws-3.4.0.jar", "version": "?" }, { "class": "org.apache.hadoop.fs.s3a.S3AFileSystem", "method": "initialize", "file": "S3AFileSystem.java", "line": 758, "exact": false, "location": "hadoop-aws-3.4.0.jar", "version": "?" }, { "class": "org.apache.hadoop.fs.FileSystem", "method": "createFileSystem", "file": "FileSystem.java", "line": 3601, "exact": false, "location": "hadoop-common-3.4.0.jar", "version": "?" }, { "class": "org.apache.hadoop.fs.FileSystem", "method": "get", "file": "FileSystem.java", "line": 552, "exact": false, "location": "hadoop-common-3.4.0.jar", "version": "?" }, {code} This appears to be related to how shading works in the aws bundle sdk. Versions : Hadoop-AWS 3.4 AWS-SDK-Bundle - 2.23.19 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19161) S3A: option "fs.s3a.performance.flags" to take list of performance flags
[ https://issues.apache.org/jira/browse/HADOOP-19161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19161. - Fix Version/s: 3.5.0 3.4.1 Resolution: Fixed > S3A: option "fs.s3a.performance.flags" to take list of performance flags > > > Key: HADOOP-19161 > URL: https://issues.apache.org/jira/browse/HADOOP-19161 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.4.1 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > HADOOP-19072 shows we want to add more optimisations than that of > HADOOP-18930. > * Extending the new optimisations to the existing option is brittle > * Adding explicit options for each feature gets complext fast. > Proposed > * A new class S3APerformanceFlags keeps all the flags > * it build this from a string[] of values, which can be extracted from > getConf(), > * and it can also support a "*" option to mean "everything" > * this class can also be handed off to hasPathCapability() and do the right > thing. > Proposed optimisations > * create file (we will hook up HADOOP-18930) > * mkdir (HADOOP-19072) > * delete (probe for parent path) > * rename (probe for source path) > We could think of more, with different names, later. > The goal is make it possible to strip out every HTTP request we do for > safety/posix compliance, so applications have the option of turning off what > they don't need. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19239) Enhance FileSystem to honor token and expiration in its cache
Xiang Li created HADOOP-19239: - Summary: Enhance FileSystem to honor token and expiration in its cache Key: HADOOP-19239 URL: https://issues.apache.org/jira/browse/HADOOP-19239 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 3.3.6 Reporter: Xiang Li Fix For: 3.3.4 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19238) Fix create-release script for arm64 based MacOS
Mukund Thakur created HADOOP-19238: -- Summary: Fix create-release script for arm64 based MacOS Key: HADOOP-19238 URL: https://issues.apache.org/jira/browse/HADOOP-19238 Project: Hadoop Common Issue Type: Bug Reporter: Mukund Thakur -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19237) upgrade dnsjava to 3.6.0 due to CVEs
PJ Fanning created HADOOP-19237: --- Summary: upgrade dnsjava to 3.6.0 due to CVEs Key: HADOOP-19237 URL: https://issues.apache.org/jira/browse/HADOOP-19237 Project: Hadoop Common Issue Type: Task Reporter: PJ Fanning See https://github.com/apache/hadoop/pull/6955 - but this is missing the necessary change to LICENSE-binary (which already has an out of date version for dnsjava). * CVE-2023-32695 * CVE-2024-25638 * https://github.com/advisories/GHSA-crjg-w57m-rqqf -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19236) Integration of Volcano Engine TOS in Hadoop.
Jinglun created HADOOP-19236: Summary: Integration of Volcano Engine TOS in Hadoop. Key: HADOOP-19236 URL: https://issues.apache.org/jira/browse/HADOOP-19236 Project: Hadoop Common Issue Type: New Feature Components: fs, tools Reporter: Jinglun Volcano Engine is a fast growing cloud vendor launched by ByteDance, and TOS is the object storage service of Volcano Engine. A common way is to store data into TOS and run Hadoop/Spark/Flink applications to access TOS. But there is no original support for TOS in hadoop, thus it is not easy for users to build their Big Data System based on TOS. This work aims to integrate TOS with Hadoop to help users run their applications on TOS. Users only need to do some simple configuration, then their applications can read/write TOS without any code change. This work is similar to AWS S3, AzureBlob, AliyunOSS, Tencnet COS and HuaweiCloud Object Storage in Hadoop. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19235) IPC client uses CompletableFuture to support asynchronous operations.
Jian Zhang created HADOOP-19235: --- Summary: IPC client uses CompletableFuture to support asynchronous operations. Key: HADOOP-19235 URL: https://issues.apache.org/jira/browse/HADOOP-19235 Project: Hadoop Common Issue Type: New Feature Components: common Reporter: Jian Zhang h3. Description In the implementation of asynchronous Ipc.client, the main methods used include HADOOP-13226, HDFS-10224, etc. However, the existing implementation does not support `CompletableFuture`; instead, it relies on setting up callbacks, which can lead to the "callback hell" problem. Using `CompletableFuture` can better organize asynchronous callbacks. Therefore, on the basis of the existing implementation, by using `CompletableFuture`, once the `client.call` is completed, the asynchronous thread handles the response of this call without blocking the main thread. *Test* new UT TestAsyncIPC#testAsyncCallWithCompletableFuture() -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19234) ABFS: [FnsOverBlob] Adding Integration Tests for Special Scenarios in Blob Endpoint
Anuj Modi created HADOOP-19234: -- Summary: ABFS: [FnsOverBlob] Adding Integration Tests for Special Scenarios in Blob Endpoint Key: HADOOP-19234 URL: https://issues.apache.org/jira/browse/HADOOP-19234 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Affects Versions: 3.4.0 Reporter: Anuj Modi FNS accounts does not understand directories and to create that abstraction client has to handle the cases where hdfs operations include interactions with directory paths. This needs some additional testing to be done for each HDFS operation where path can exists as directory. More details to follow Perquisites: # HADOOP-19187 ABFS: [FnsOverBlob]Making AbfsClient Abstract for supporting both DFS and Blob Endpoint # HADOOP-19207 ABFS: [FnsOverBlob]Response Handling of Blob Endpoint APIs and Metadata APIs[|https://issues.apache.org/jira/secure/DeleteLink.jspa?id=13579416=13583033=12310460_token=A5KQ-2QAV-T4JA-FDED_17fc7154167b7d6d6490aa6508db554fd6d7af24_lin] # HADOOP-19226 ABFS: [FnsOverBlob]Implementing Azure Rest APIs on Blob Endpoint for AbfsBlobClient # HADOOP-19232 ABFS: [FnsOverBlob] Implementing Ingress Support with various Fallback Handling # HADOOP-19233 ABFS: [FnsOverBlob] Implementing Rename and Delete APIs over Blob Endpoint -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19233) ABFS: [FnsOverBlob] Implementing Rename and Delete APIs over Blob Endpoint
Anuj Modi created HADOOP-19233: -- Summary: ABFS: [FnsOverBlob] Implementing Rename and Delete APIs over Blob Endpoint Key: HADOOP-19233 URL: https://issues.apache.org/jira/browse/HADOOP-19233 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Affects Versions: 3.4.0 Reporter: Anuj Modi Assignee: Anuj Modi Enable rename and delete over Blob endpoint. The endpoint does not support rename API and not directory-delete. Therefore, all the orchestration and handling has to be added on client side. More details will follow Perquisites for this Patch: 1. HADOOP-19187 ABFS: [FnsOverBlob]Making AbfsClient Abstract for supporting both DFS and Blob Endpoint - ASF JIRA (apache.org) 2. HADOOP-19226 ABFS: [FnsOverBlob]Implementing Azure Rest APIs on Blob Endpoint for AbfsBlobClient - ASF JIRA (apache.org) 3. HADOOP-19207 ABFS: [FnsOverBlob]Response Handling of Blob Endpoint APIs and Metadata APIs - ASF JIRA (apache.org) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19232) ABFS: [FnsOverBlob] Implementing Ingress Support with various Fallback Handling
Anuj Modi created HADOOP-19232: -- Summary: ABFS: [FnsOverBlob] Implementing Ingress Support with various Fallback Handling Key: HADOOP-19232 URL: https://issues.apache.org/jira/browse/HADOOP-19232 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Affects Versions: 3.4.0 Reporter: Anuj Modi Assignee: Anmol Asrani Scope of this task is to refactor the AbfsOutputStream class to handle the ingress for DFS and Blob endpoint effectively. More details will be added soon. Perquisites for this Patch: 1. [HADOOP-19187] ABFS: [FnsOverBlob]Making AbfsClient Abstract for supporting both DFS and Blob Endpoint - ASF JIRA (apache.org) 2. [HADOOP-19226] ABFS: [FnsOverBlob]Implementing Azure Rest APIs on Blob Endpoint for AbfsBlobClient - ASF JIRA (apache.org) 3. [HADOOP-19207] ABFS: [FnsOverBlob]Response Handling of Blob Endpoint APIs and Metadata APIs - ASF JIRA (apache.org) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19231) add JacksonUtil to centralise some code
PJ Fanning created HADOOP-19231: --- Summary: add JacksonUtil to centralise some code Key: HADOOP-19231 URL: https://issues.apache.org/jira/browse/HADOOP-19231 Project: Hadoop Common Issue Type: Task Reporter: PJ Fanning To future proof Hadoop against Jackson changes, it makes sense to not just create ObjectMappers and JsonFactories in many different places in the Hadoop code. One of the main drivers of this is https://www.javadoc.io/doc/com.fasterxml.jackson.core/jackson-core/latest/com/fasterxml/jackson/core/StreamReadConstraints.html Jackson 3 (not yet scheduled for release) has some fairly big API and behaviour changes too. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19228) ShellCommandFencer#setConfAsEnvVars should also replace '-' with '_'.
[ https://issues.apache.org/jira/browse/HADOOP-19228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He resolved HADOOP-19228. -- Fix Version/s: 3.5.0 Hadoop Flags: Reviewed Resolution: Fixed > ShellCommandFencer#setConfAsEnvVars should also replace '-' with '_'. > - > > Key: HADOOP-19228 > URL: https://issues.apache.org/jira/browse/HADOOP-19228 > Project: Hadoop Common > Issue Type: Improvement >Reporter: fuchaohong >Assignee: fuchaohong >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > When setting configuration into environment variables, '-' should also be > replaced with '_'. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19227) ipc.Server accelerate token negotiation only for the default mechanism
[ https://issues.apache.org/jira/browse/HADOOP-19227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz-wo Sze resolved HADOOP-19227. - Fix Version/s: 3.3.7 Hadoop Flags: Reviewed Resolution: Fixed The pull request is now merged. > ipc.Server accelerate token negotiation only for the default mechanism > -- > > Key: HADOOP-19227 > URL: https://issues.apache.org/jira/browse/HADOOP-19227 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Tsz-wo Sze >Assignee: Tsz-wo Sze >Priority: Major > Labels: pull-request-available > Fix For: 3.3.7 > > > {code} > //Server.java > // accelerate token negotiation by sending initial challenge > // in the negotiation response > if (enabledAuthMethods.contains(AuthMethod.TOKEN)) { > ... > } > {code} > In Server.Connection.buildSaslNegotiateResponse() above, it accelerates token > negotiation by sending initial challenge in the negotiation response. > However, it is a non-standard SASL > negotiation. We should do it only for the default SASL mechanism. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19230) upgrade to jackson 2.14.3
PJ Fanning created HADOOP-19230: --- Summary: upgrade to jackson 2.14.3 Key: HADOOP-19230 URL: https://issues.apache.org/jira/browse/HADOOP-19230 Project: Hadoop Common Issue Type: Task Components: common Reporter: PJ Fanning Follow up to HADOOP-18332 I have what I believe fixes the Jackson JAX-RS incompatibility. https://github.com/pjfanning/jsr311-compat/ The reason that I want to start by just going to Jackson 2.14 is that Jackson has new StreamReadConstraints in Jackson 2.15 to protect against malicious JSON inputs. The constraints are generous but can cause issues with very large or deeply nested inputs. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19229) Vector IO: have a max distance between ranges to range
Steve Loughran created HADOOP-19229: --- Summary: Vector IO: have a max distance between ranges to range Key: HADOOP-19229 URL: https://issues.apache.org/jira/browse/HADOOP-19229 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.1 Reporter: Steve Loughran vector iO has a max size to coalesce ranges, but it also needs a maximum gap between ranges to justify the merge. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19218) Avoid DNS lookup while creating IPC Connection object
[ https://issues.apache.org/jira/browse/HADOOP-19218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He resolved HADOOP-19218. -- Fix Version/s: 3.5.0 Hadoop Flags: Reviewed Target Version/s: (was: 3.3.9, 3.5.0, 3.4.1) Resolution: Fixed > Avoid DNS lookup while creating IPC Connection object > - > > Key: HADOOP-19218 > URL: https://issues.apache.org/jira/browse/HADOOP-19218 > Project: Hadoop Common > Issue Type: Improvement > Components: ipc >Reporter: Viraj Jasani >Assignee: Viraj Jasani >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > Been running HADOOP-18628 in production for quite sometime, everything works > fine as long as DNS servers in HA are available. Upgrading single NS server > at a time is also a common case, not problematic. Every DNS lookup takes 1ms > in general. > However, recently we encountered a case where 2 out of 4 NS servers went down > (temporarily but it's a rare case). With small duration DNS cache and 2s of > NS fallback timeout configured in resolv.conf, now any client performing DNS > lookup can encounter 4s+ delay. This caused namenode outage as listener > thread is single threaded and it was not able to keep up with large num of > unique clients (in direct proportion with num of DNS resolutions every few > seconds) initiating connection on listener port. > While having 2 out of 4 DNS servers offline is rare case and NS fallback > settings could also be improved, it is important to note that we don't need > to perform DNS resolution for every new connection if the intention is to > improve the insights into VersionMistmatch errors thrown by the server. > The proposal is the delay the DNS resolution until the server throws the > error for incompatible header or version mismatch. This would also help with > ~1ms extra time spent even for healthy DNS lookup. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19227) ipc.Server accelerate token negotiation only for the default mechanism
Tsz-wo Sze created HADOOP-19227: --- Summary: ipc.Server accelerate token negotiation only for the default mechanism Key: HADOOP-19227 URL: https://issues.apache.org/jira/browse/HADOOP-19227 Project: Hadoop Common Issue Type: Improvement Components: ipc Reporter: Tsz-wo Sze Assignee: Tsz-wo Sze {code} //Server.java // accelerate token negotiation by sending initial challenge // in the negotiation response if (enabledAuthMethods.contains(AuthMethod.TOKEN)) { ... } {code} In Server.Connection.buildSaslNegotiateResponse() above, it accelerates token negotiation by sending initial challenge in the negotiation response. However, it is a non-standard SASL negotiation. We should do it only for the default SASL mechanism. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19226) ABFS: Implementing Azure Rest APIs on Blob Endpoint for AbfsBlobClient
Anuj Modi created HADOOP-19226: -- Summary: ABFS: Implementing Azure Rest APIs on Blob Endpoint for AbfsBlobClient Key: HADOOP-19226 URL: https://issues.apache.org/jira/browse/HADOOP-19226 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Affects Versions: 3.4.0 Reporter: Anuj Modi This is second task in series of tasks for implementing Blob Endpoint support for FNS accounts. This patch will have changes to implement all the APIs over Blob Endpoint as a part of implementing AbfsBlobClient. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19225) Upgrade to jetty 9.4.55 due to CVE
Palakur Eshwitha Sai created HADOOP-19225: - Summary: Upgrade to jetty 9.4.55 due to CVE Key: HADOOP-19225 URL: https://issues.apache.org/jira/browse/HADOOP-19225 Project: Hadoop Common Issue Type: Improvement Reporter: Palakur Eshwitha Sai Assignee: Palakur Eshwitha Sai -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19222) Switch yum repo baseurl due to CentOS 7 sunset
[ https://issues.apache.org/jira/browse/HADOOP-19222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shilun Fan resolved HADOOP-19222. - Hadoop Flags: Reviewed Resolution: Fixed > Switch yum repo baseurl due to CentOS 7 sunset > -- > > Key: HADOOP-19222 > URL: https://issues.apache.org/jira/browse/HADOOP-19222 > Project: Hadoop Common > Issue Type: Bug > Components: build >Affects Versions: 3.4.0, 3.4.1 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > Similar to HADOOP-18151 (which handled sunset for CentOS 8), CentOS 7 reached > EOL on July 1, 2024 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19224) Upgrade esdk to the latest version 3.24.3
melin created HADOOP-19224: -- Summary: Upgrade esdk to the latest version 3.24.3 Key: HADOOP-19224 URL: https://issues.apache.org/jira/browse/HADOOP-19224 Project: Hadoop Common Issue Type: Improvement Components: fs/huawei Reporter: melin The current version relies on okhttp 3.x and would like to upgrade to the latest version, which relies on okhttp 4.12 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19223) Don't fail CI if no tests are changed
Cheng Pan created HADOOP-19223: -- Summary: Don't fail CI if no tests are changed Key: HADOOP-19223 URL: https://issues.apache.org/jira/browse/HADOOP-19223 Project: Hadoop Common Issue Type: Wish Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19222) Switch yum repo baseurl due to CentOS 7 sunset
Cheng Pan created HADOOP-19222: -- Summary: Switch yum repo baseurl due to CentOS 7 sunset Key: HADOOP-19222 URL: https://issues.apache.org/jira/browse/HADOOP-19222 Project: Hadoop Common Issue Type: Bug Reporter: Cheng Pan Similar to HADOOP-18151 (which handled sunset for CentOS 8), CentOS reached EOL on July 1, 2024 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-13463) update to Guice 4.1
[ https://issues.apache.org/jira/browse/HADOOP-13463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Cheng Pan resolved HADOOP-13463. Resolution: Won't Do Replaced with HADOOP-19216 > update to Guice 4.1 > --- > > Key: HADOOP-13463 > URL: https://issues.apache.org/jira/browse/HADOOP-13463 > Project: Hadoop Common > Issue Type: Improvement > Components: build >Affects Versions: 3.0.0-alpha1 >Reporter: Sean Busbey >Priority: Minor > > Right now trunk uses Guice 4.0, which is about a year old. We should update > to 4.1, so long as we're making the jump from 3 to 4 in the branch-2 -> 3.0 > transition. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19220) S3A : S3AInputStream positioned readFully Expectation
[ https://issues.apache.org/jira/browse/HADOOP-19220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19220. - Resolution: Works for Me it works for me and for the people who support calls would ruin my life if it didn't work for them. You have probably done something with your mocking test set up that does not match what s3afs does. My recommendation is: step through the failing test with a debugger. I'm not going to look at the code because the way to do anything like that would be to share it as a github reference. But anyway, not a jira class issue - Not yet, anyway. This is the kind of problem to raise on the developer mailing list. For that reason, I'm going to close it as a WORKSFORME. sorry. stick the code on github as a gist or something and discusson the hadoop developer list. if it really is a bug in s3a fs code, this jira can be re-opened. > S3A : S3AInputStream positioned readFully Expectation > - > > Key: HADOOP-19220 > URL: https://issues.apache.org/jira/browse/HADOOP-19220 > Project: Hadoop Common > Issue Type: Bug > Components: fs/s3 >Reporter: Vinay Devadiga >Priority: Major > > So basically i was testing to write some unit test - for S3AInputStream > readFully Method > package org.apache.hadoop.fs.s3a; > import java.io.EOFException; > import java.io.FilterInputStream; > import java.io.IOException; > import java.io.InputStream; > import java.net.SocketException; > import java.net.URI; > import java.nio.ByteBuffer; > import java.nio.charset.Charset; > import java.nio.charset.StandardCharsets; > import java.util.concurrent.CompletableFuture; > import java.util.concurrent.TimeUnit; > import org.apache.commons.io.IOUtils; > import org.apache.hadoop.conf.Configuration; > import org.apache.hadoop.fs.Path; > import org.apache.hadoop.fs.s3a.audit.impl.NoopSpan; > import org.apache.hadoop.fs.s3a.auth.delegation.EncryptionSecrets; > import org.apache.hadoop.util.BlockingThreadPoolExecutorService; > import org.apache.hadoop.util.functional.CallableRaisingIOE; > import org.assertj.core.api.Assertions; > import org.junit.Before; > import org.junit.Test; > import software.amazon.awssdk.awscore.exception.AwsErrorDetails; > import software.amazon.awssdk.awscore.exception.AwsServiceException; > import software.amazon.awssdk.core.ResponseInputStream; > import software.amazon.awssdk.http.AbortableInputStream; > import software.amazon.awssdk.services.s3.S3Client; > import software.amazon.awssdk.services.s3.model.GetObjectRequest; > import software.amazon.awssdk.services.s3.model.GetObjectResponse; > import static java.lang.Math.min; > import static java.nio.charset.StandardCharsets.UTF_8; > import static org.apache.hadoop.fs.s3a.Constants.ASYNC_DRAIN_THRESHOLD; > import static org.apache.hadoop.fs.s3a.Constants.AWS_REGION; > import static org.apache.hadoop.fs.s3a.Constants.FS_S3A; > import static org.apache.hadoop.fs.s3a.Constants.MULTIPART_MIN_SIZE; > import static org.apache.hadoop.fs.s3a.Constants.S3_CLIENT_FACTORY_IMPL; > import static org.apache.hadoop.util.functional.FutureIO.eval; > import static org.assertj.core.api.Assertions.assertThat; > import static > org.assertj.core.api.AssertionsForClassTypes.assertThatExceptionOfType; > import static org.mockito.ArgumentMatchers.any; > import static org.mockito.Mockito.never; > import static org.mockito.Mockito.verify; > public class TestReadFullyAndPositionalRead { > private S3AFileSystem fs; > private S3AInputStream input; > private S3Client s3; > private static final String EMPTY = ""; > private static final String INPUT = "test_content"; > @Before > public void setUp() throws IOException { > Configuration conf = createConfiguration(); > fs = new S3AFileSystem(); > URI uri = URI.create(FS_S3A + "://" + MockS3AFileSystem.BUCKET); > // Unset S3CSE property from config to avoid pathIOE. > conf.unset(Constants.S3_ENCRYPTION_ALGORITHM); > fs.initialize(uri, conf); > s3 = fs.getS3AInternals().getAmazonS3Client("mocking"); > } > public Configuration createConfiguration() { > Configuration conf = new Configuration(); > conf.setClass(S3_CLIENT_FACTORY_IMPL, MockS3ClientFactory.class, > S3ClientFactory.class); > // use minimum multipart size for faster triggering > conf.setLong(Constants.MULTIPART_SIZE, MULTIPART_MIN_SIZE); > conf.setInt(Constants.S3A_BUCKET_PROBE, 1); > // this is so stream draining
[jira] [Created] (HADOOP-19221) S3a: retry on 400 +ErrorCode RequestTimeout
Steve Loughran created HADOOP-19221: --- Summary: S3a: retry on 400 +ErrorCode RequestTimeout Key: HADOOP-19221 URL: https://issues.apache.org/jira/browse/HADOOP-19221 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Steve Loughran if a slow block update takes too long then the connection is broken s3 side with an error message, as a 400 response {code} Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400; Error Code: RequestTimeout; Request ID:; S3 Extended Request ID: {code} This is recoverable and should be treated as such, either using the normal exception policy or maybe even throttlePolicy -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19195) Upgrade aws sdk v2 to 2.25.53
[ https://issues.apache.org/jira/browse/HADOOP-19195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19195. - Fix Version/s: 3.4.1 Resolution: Fixed merged to 3.4 and trunk branches Harshit, can you leave the "fix version" field blank; use target version to indicate which version it is aimed at. We use the fix version to track which versions it has actually been marged into, and for the automated release note generation. thanks > Upgrade aws sdk v2 to 2.25.53 > - > > Key: HADOOP-19195 > URL: https://issues.apache.org/jira/browse/HADOOP-19195 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.5.0, 3.4.1 >Reporter: Harshit Gupta >Assignee: Harshit Gupta >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > Upgrade aws sdk v2 to 2.25.53 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19220) S3A : S3AInputStream positioned readFully Expectation
Vinay Devadiga created HADOOP-19220: --- Summary: S3A : S3AInputStream positioned readFully Expectation Key: HADOOP-19220 URL: https://issues.apache.org/jira/browse/HADOOP-19220 Project: Hadoop Common Issue Type: Bug Components: fs/s3 Reporter: Vinay Devadiga So basically i was testing to write some unit test - for S3AInputStream readFully Method package org.apache.hadoop.fs.s3a; import java.io.EOFException; import java.io.FilterInputStream; import java.io.IOException; import java.io.InputStream; import java.net.SocketException; import java.net.URI; import java.nio.ByteBuffer; import java.nio.charset.Charset; import java.nio.charset.StandardCharsets; import java.util.concurrent.CompletableFuture; import java.util.concurrent.TimeUnit; import org.apache.commons.io.IOUtils; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Path; import org.apache.hadoop.fs.s3a.audit.impl.NoopSpan; import org.apache.hadoop.fs.s3a.auth.delegation.EncryptionSecrets; import org.apache.hadoop.util.BlockingThreadPoolExecutorService; import org.apache.hadoop.util.functional.CallableRaisingIOE; import org.assertj.core.api.Assertions; import org.junit.Before; import org.junit.Test; import software.amazon.awssdk.awscore.exception.AwsErrorDetails; import software.amazon.awssdk.awscore.exception.AwsServiceException; import software.amazon.awssdk.core.ResponseInputStream; import software.amazon.awssdk.http.AbortableInputStream; import software.amazon.awssdk.services.s3.S3Client; import software.amazon.awssdk.services.s3.model.GetObjectRequest; import software.amazon.awssdk.services.s3.model.GetObjectResponse; import static java.lang.Math.min; import static java.nio.charset.StandardCharsets.UTF_8; import static org.apache.hadoop.fs.s3a.Constants.ASYNC_DRAIN_THRESHOLD; import static org.apache.hadoop.fs.s3a.Constants.AWS_REGION; import static org.apache.hadoop.fs.s3a.Constants.FS_S3A; import static org.apache.hadoop.fs.s3a.Constants.MULTIPART_MIN_SIZE; import static org.apache.hadoop.fs.s3a.Constants.S3_CLIENT_FACTORY_IMPL; import static org.apache.hadoop.util.functional.FutureIO.eval; import static org.assertj.core.api.Assertions.assertThat; import static org.assertj.core.api.AssertionsForClassTypes.assertThatExceptionOfType; import static org.mockito.ArgumentMatchers.any; import static org.mockito.Mockito.never; import static org.mockito.Mockito.verify; public class TestReadFullyAndPositionalRead { private S3AFileSystem fs; private S3AInputStream input; private S3Client s3; private static final String EMPTY = ""; private static final String INPUT = "test_content"; @Before public void setUp() throws IOException { Configuration conf = createConfiguration(); fs = new S3AFileSystem(); URI uri = URI.create(FS_S3A + "://" + MockS3AFileSystem.BUCKET); // Unset S3CSE property from config to avoid pathIOE. conf.unset(Constants.S3_ENCRYPTION_ALGORITHM); fs.initialize(uri, conf); s3 = fs.getS3AInternals().getAmazonS3Client("mocking"); } public Configuration createConfiguration() { Configuration conf = new Configuration(); conf.setClass(S3_CLIENT_FACTORY_IMPL, MockS3ClientFactory.class, S3ClientFactory.class); // use minimum multipart size for faster triggering conf.setLong(Constants.MULTIPART_SIZE, MULTIPART_MIN_SIZE); conf.setInt(Constants.S3A_BUCKET_PROBE, 1); // this is so stream draining is always blocking, allowing assertions to be safely made without worrying about any race conditions conf.setInt(ASYNC_DRAIN_THRESHOLD, Integer.MAX_VALUE); // set the region to avoid the getBucketLocation on FS init. conf.set(AWS_REGION, "eu-west-1"); return conf; } @Test public void testReadFullyFromBeginning() throws IOException { input = getMockedS3AInputStream(INPUT); byte[] byteArray = new byte[INPUT.length()]; input.readFully(0, byteArray, 0, byteArray.length); assertThat(new String(byteArray, UTF_8)).isEqualTo(INPUT); } @Test public void testReadFullyWithOffsetAndLength() throws IOException { input = getMockedS3AInputStream(INPUT); byte[] byteArray = new byte[4]; input.readFully(5, byteArray, 0, 4); assertThat(new String(byteArray, UTF_8)).isEqualTo("cont"); } @Test public void testReadFullyWithOffsetBeyondStream() throws IOException { input = getMockedS3AInputStream(INPUT); byte[] byteArray = new byte[10]; assertThatExceptionOfType(EOFException.class) .isThrownBy(() -> input.readFully(20, byteArray, 0, 10)); } private S3AInputStream getMockedS3AInputStream(String input) { Path path = new Path(
[jira] [Resolved] (HADOOP-19216) Upgrade Guice from 4.0 to 5.1.0 to support Java 17
[ https://issues.apache.org/jira/browse/HADOOP-19216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena resolved HADOOP-19216. --- Fix Version/s: 3.5.0 Hadoop Flags: Reviewed Resolution: Fixed > Upgrade Guice from 4.0 to 5.1.0 to support Java 17 > -- > > Key: HADOOP-19216 > URL: https://issues.apache.org/jira/browse/HADOOP-19216 > Project: Hadoop Common > Issue Type: Task >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19205) S3A initialization/close slower than with v1 SDK
[ https://issues.apache.org/jira/browse/HADOOP-19205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19205. - Fix Version/s: 3.5.0 3.4.1 Resolution: Fixed > S3A initialization/close slower than with v1 SDK > > > Key: HADOOP-19205 > URL: https://issues.apache.org/jira/browse/HADOOP-19205 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > Attachments: Screenshot 2024-06-14 at 17.12.59.png, Screenshot > 2024-06-14 at 17.14.33.png > > > Hive QE have observed slowdown in LLAP queries due to time to create and > close s3a filesystems instances. A key aspect of that is they keep closing > the fs instances (HIVE-27884), but looking at the profiles, the reason things > seem to have regressed is > * two s3 clients are being created (sync and async) > * these seem to take a lot of time scanning the classpath for "global > interceptors", which is at least an O(jars) operation; #of index entries in > the zip files may factor too. > Proposed: > * create async client on demand when the transfer manager is invoked > * look at why passwords are being scanned for if > InstanceProfileCredentialsProvider is in use...that seems slow too > SDK wishes > * SDK maybe allow us to turn off that scan for interceptors? > attaching screenshots of the profile. storediag snippet: > {code} > [001] fs.s3a.access.key = (unset) > [002] fs.s3a.secret.key = (unset) > [003] fs.s3a.session.token = (unset) > [004] fs.s3a.server-side-encryption-algorithm = (unset) > [005] fs.s3a.server-side-encryption.key = (unset) > [006] fs.s3a.encryption.algorithm = (unset) > [007] fs.s3a.encryption.key = (unset) > [008] fs.s3a.aws.credentials.provider = > "com.amazonaws.auth.InstanceProfileCredentialsProvider" [core-site.xml] > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19215) Fix unit tests testSlowConnection and testBadSetup failed in TestRPC
[ https://issues.apache.org/jira/browse/HADOOP-19215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena resolved HADOOP-19215. --- Fix Version/s: 3.5.0 Hadoop Flags: Reviewed Resolution: Fixed > Fix unit tests testSlowConnection and testBadSetup failed in TestRPC > > > Key: HADOOP-19215 > URL: https://issues.apache.org/jira/browse/HADOOP-19215 > Project: Hadoop Common > Issue Type: Bug > Components: test >Affects Versions: 3.4.0 >Reporter: farmmamba >Assignee: farmmamba >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.0 > > > Fix unit tests testSlowConnection and testBadSetup failed in TestRPC. > We should use ProtobufRpcEngine2 ProtocolEngine. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19219) Resolve Certificate error in Hadoop-auth tests.
Muskan Mishra created HADOOP-19219: -- Summary: Resolve Certificate error in Hadoop-auth tests. Key: HADOOP-19219 URL: https://issues.apache.org/jira/browse/HADOOP-19219 Project: Hadoop Common Issue Type: Sub-task Reporter: Muskan Mishra While compiling Hadoop-Trunk with JDK17, faced following errors in TestMultiSchemeAuthenticationHandler and TestLdapAuthenticationHandler classes. {code:java} [INFO] Running org.apache.hadoop.security.authentication.server.TestMultiSchemeAuthenticationHandler [ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.256 s <<< FAILURE! - in org.apache.hadoop.security.authentication.server.TestMultiSchemeAuthenticationHandler [ERROR] org.apache.hadoop.security.authentication.server.TestMultiSchemeAuthenticationHandler Time elapsed: 1.255 s <<< ERROR! java.lang.IllegalAccessError: class org.apache.directory.server.core.security.CertificateUtil (in unnamed module @0x32e614e9) cannot access class sun.security.x509.X500Name (in module java.base) because module java.base does not export sun.security.x509 to unnamed module @0x32e614e9 at org.apache.directory.server.core.security.CertificateUtil.createTempKeyStore(CertificateUtil.java:334) at org.apache.directory.server.factory.ServerAnnotationProcessor.instantiateLdapServer(ServerAnnotationProcessor.java:158) at org.apache.directory.server.factory.ServerAnnotationProcessor.createLdapServer(ServerAnnotationProcessor.java:318) at org.apache.directory.server.factory.ServerAnnotationProcessor.createLdapServer(ServerAnnotationProcessor.java:351) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19218) Avoid DNS lookup while creating IPC Connection object
Viraj Jasani created HADOOP-19218: - Summary: Avoid DNS lookup while creating IPC Connection object Key: HADOOP-19218 URL: https://issues.apache.org/jira/browse/HADOOP-19218 Project: Hadoop Common Issue Type: Improvement Reporter: Viraj Jasani Been running HADOOP-18628 in production for quite sometime, everything works fine as long as DNS servers in HA are available. Upgrading single NS server at a time is also a common case, not problematic. However, recently we encountered a case where 2 out of 4 NS servers went down (temporarily but it's a rare case). With small duration DNS cache and 2s of NS fallback timeout configured in resolv.conf, now any client performing DNS lookup can encounter 4s+ delay. This caused namenode outage as listener thread is single threaded and it was not able to keep up with large num of unique clients (in direct proportion with num of DNS resolutions every few seconds) initiating connection on listener port. While having 2 out of 4 DNS servers offline is rare case and NS fallback settings could also be improved, it is important to note that we don't need to perform DNS resolution for every new connection if the intention is to improve the insights into VersionMistmatch errors thrown by the server. The proposal is the delay the DNS resolution until the server throws the error for incompatible header or version mismatch. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19210) s3a: TestS3AAWSCredentialsProvider and TestS3AInputStreamRetry really slow
[ https://issues.apache.org/jira/browse/HADOOP-19210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19210. - Fix Version/s: 3.5.0 3.4.1 Resolution: Fixed > s3a: TestS3AAWSCredentialsProvider and TestS3AInputStreamRetry really slow > -- > > Key: HADOOP-19210 > URL: https://issues.apache.org/jira/browse/HADOOP-19210 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, test >Affects Versions: 3.4.0, 3.5.0 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Fix For: 3.5.0, 3.4.1 > > > Not noticed this before, but the unit tests TestS3AAWSCredentialsProvider and > TestS3AInputStreamRetry are so slow they will be hurting over all test > performance times: no integration tests will start until these are all > complete. > {code} > mvn test -T 1C -Dparallel-tests > ... > [INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 37.877 > s - in org.apache.hadoop.fs.s3a.TestS3AInputStreamRetry > ... > [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: > 90.038 s - in org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider > {code} > The PR cuts total execution time of a 10 thread test run from 3 minutes to > 2:30 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19217) Introduce getTrashPolicy to FileSystem API
Ivan Andika created HADOOP-19217: Summary: Introduce getTrashPolicy to FileSystem API Key: HADOOP-19217 URL: https://issues.apache.org/jira/browse/HADOOP-19217 Project: Hadoop Common Issue Type: Improvement Components: fs Reporter: Ivan Andika Hadoop FileSystem supports multiple FileSystem implementations awareness (e.g. client is aware of both hdfs:// and ofs:// protocols). However, it seems that currently Hadoop TrashPolicy remains the same regardless of the URI scheme. The TrashPolicy is governed by "fs.trash.classname" configuration and stays the same regardless of the FileSystem implementation. For example, HDFS defaults to TrashPolicyDefault and Ozone defaults to TrashPolicyOzone, but only one will be picked since the the configuration will be overwritten by the other. Therefore, I propose to couple the TrashPolicy for each specific FileSystem implementation by introducing a new FileSystem#getTrashPolicy. TrashPolicy#getInstance can call FileSystem#getTrashPolicy to get the appropriate TrashPolicy. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19216) Upgrade Guice from 4.0 to 5.1.0 to support Java 17
Cheng Pan created HADOOP-19216: -- Summary: Upgrade Guice from 4.0 to 5.1.0 to support Java 17 Key: HADOOP-19216 URL: https://issues.apache.org/jira/browse/HADOOP-19216 Project: Hadoop Common Issue Type: Task Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19215) Fix unit tests testSlowConnection and testBadSetup failed in TestRPC
farmmamba created HADOOP-19215: -- Summary: Fix unit tests testSlowConnection and testBadSetup failed in TestRPC Key: HADOOP-19215 URL: https://issues.apache.org/jira/browse/HADOOP-19215 Project: Hadoop Common Issue Type: Bug Components: test Affects Versions: 3.4.0 Reporter: farmmamba Assignee: farmmamba Fix unit tests testSlowConnection and testBadSetup failed in TestRPC. We should use ProtobufRpcEngine2 ProtocolEngine. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19214) Invalid GPG commands in Releases page
Attila Doroszlai created HADOOP-19214: - Summary: Invalid GPG commands in Releases page Key: HADOOP-19214 URL: https://issues.apache.org/jira/browse/HADOOP-19214 Project: Hadoop Common Issue Type: Bug Components: website Reporter: Attila Doroszlai Assignee: Attila Doroszlai Instructions in [Download page|https://hadoop.apache.org/releases.html] shows GPG commands with {{--}} converted to , which makes the commands invalid. {code} gpg –import KEYS gpg –verify hadoop-X.Y.Z-src.tar.gz.asc {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19213) testUpdateDeepDirectoryStructureToRemote intermittent failures
Pranav Saxena created HADOOP-19213: -- Summary: testUpdateDeepDirectoryStructureToRemote intermittent failures Key: HADOOP-19213 URL: https://issues.apache.org/jira/browse/HADOOP-19213 Project: Hadoop Common Issue Type: Bug Components: tools/distcp Reporter: Pranav Saxena Test testUpdateDeepDirectoryStructureToRemote intermittently fails. Following is an instance in ABFS test runs: ``` [ERROR] testUpdateDeepDirectoryStructureToRemote(org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractDistCp) Time elapsed: 2.951 s <<< FAILURE! java.lang.AssertionError: Files Copied value 2 above maximum 1 at org.junit.Assert.fail(Assert.java:89) at org.junit.Assert.assertTrue(Assert.java:42) at org.apache.hadoop.tools.contract.AbstractContractDistCpTest.assertCounterInRange(AbstractContractDistCpTest.java:294) at org.apache.hadoop.tools.contract.AbstractContractDistCpTest.distCpUpdateDeepDirectoryStructure(AbstractContractDistCpTest.java:334) at org.apache.hadoop.tools.contract.AbstractContractDistCpTest.testUpdateDeepDirectoryStructureToRemote(AbstractContractDistCpTest.java:259) ``` There is one JIRA in Apache-Ozone which was raised in S3 test run: https://issues.apache.org/jira/browse/HDDS-10616 ``` org.apache.hadoop.fs.contract.s3a.ITestS3AContractDistCp testUpdateDeepDirectoryStructureToRemote(org.apache.hadoop.fs.contract.s3a.ITestS3AContractDistCp) Time elapsed: 2.375 s <<< FAILURE! java.lang.AssertionError: Files Copied value 2 above maximum 1 at org.junit.Assert.fail(Assert.java:89) at org.junit.Assert.assertTrue(Assert.java:42) at org.apache.hadoop.tools.contract.AbstractContractDistCpTest.assertCounterInRange(AbstractContractDistCpTest.java:294) at org.apache.hadoop.tools.contract.AbstractContractDistCpTest.distCpUpdateDeepDirectoryStructure(AbstractContractDistCpTest.java:334) at org.apache.hadoop.tools.contract.AbstractContractDistCpTest.testUpdateDeepDirectoryStructureToRemote(AbstractContractDistCpTest.java:259) ``` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19211) AliyunOSS: Support vectored read API
wujinhu created HADOOP-19211: Summary: AliyunOSS: Support vectored read API Key: HADOOP-19211 URL: https://issues.apache.org/jira/browse/HADOOP-19211 Project: Hadoop Common Issue Type: Improvement Components: fs/oss Affects Versions: 3.3.6, 3.2.4 Reporter: wujinhu Assignee: wujinhu -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19210) s3a: TestS3AAWSCredentialsProvider and TestS3AInputStreamRetry really slow
Steve Loughran created HADOOP-19210: --- Summary: s3a: TestS3AAWSCredentialsProvider and TestS3AInputStreamRetry really slow Key: HADOOP-19210 URL: https://issues.apache.org/jira/browse/HADOOP-19210 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3, test Affects Versions: 3.5.0 Reporter: Steve Loughran Not noticed this before, but the unit tests TestS3AAWSCredentialsProvider and TestS3AInputStreamRetry are so slow they will be hurting over all test performance times: no integration tests will start until these are all complete. {code} mvn test -T 1C -Dparallel-tests ... [INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 37.877 s - in org.apache.hadoop.fs.s3a.TestS3AInputStreamRetry ... [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 90.038 s - in org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19194) Add test to find unshaded dependencies in the aws sdk
[ https://issues.apache.org/jira/browse/HADOOP-19194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19194. - Fix Version/s: 3.4.1 Resolution: Fixed This highlights how many unshaded artifacts are in that bundle.jar, the one we use precisely to avoid classpath problems, especially with the aws sdk trying to dictate the jackson library h2. Should we give up shipping it? yes: it's tainted, things like netty are still there no: at least jackson is shaded. I'm not happy about slf4j or netty classes > Add test to find unshaded dependencies in the aws sdk > - > > Key: HADOOP-19194 > URL: https://issues.apache.org/jira/browse/HADOOP-19194 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.4.0 >Reporter: Harshit Gupta >Assignee: Harshit Gupta >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > Write a test to assess the aws sdk for unshaded artefacts on the class path > which might cause deployment failures. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19204) VectorIO regression: empty ranges are now rejected
[ https://issues.apache.org/jira/browse/HADOOP-19204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19204. - Fix Version/s: 3.5.0 3.4.1 Resolution: Fixed > VectorIO regression: empty ranges are now rejected > -- > > Key: HADOOP-19204 > URL: https://issues.apache.org/jira/browse/HADOOP-19204 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs >Affects Versions: 3.4.1 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > The validation now rejects a readvectored with an empty range, whereas before > it was a no-op > Proposed fix, return the empty list; add test -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19206) Hadoop release contains a 530MB bundle-2.23.19.jar
[ https://issues.apache.org/jira/browse/HADOOP-19206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tsz-wo Sze resolved HADOOP-19206. - Resolution: Duplicate Resolving this as a duplicate of HADOOP-19083. > Hadoop release contains a 530MB bundle-2.23.19.jar > -- > > Key: HADOOP-19206 > URL: https://issues.apache.org/jira/browse/HADOOP-19206 > Project: Hadoop Common > Issue Type: Improvement > Components: build >Reporter: Tsz-wo Sze >Priority: Major > > The size of Hadoop binary release (v3.4.0) is 1.7 GB. > {code:java} > hadoop-3.4.0$du -h -d 1 > $du -h -d 1 . > 2.0M ./bin > 260K ./libexec > 72K ./include > 212K ./sbin > 184K ./etc > 232K ./licenses-binary > 316M ./lib > 1.4G ./share > 1.7G . > {code} > A large component is bundle-2.23.19.jar, which is [AWS Java SDK :: > Bundle|https://mvnrepository.com/artifact/software.amazon.awssdk/bundle/2.23.19] > {code:java} > hadoop-3.4.0$ls -lh share/hadoop/tools/lib/bundle-2.23.19.jar > -rw-r--r--@ 1 szetszwo staff 530M Mar 4 15:41 > share/hadoop/tools/lib/bundle-2.23.19.jar > {code} > We should revisit if such a large jar is really needed to be included in the > release. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19203) WrappedIO BulkDelete API to raise IOEs as UncheckedIOExceptions
[ https://issues.apache.org/jira/browse/HADOOP-19203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19203. - Fix Version/s: 3.4.1 Resolution: Fixed > WrappedIO BulkDelete API to raise IOEs as UncheckedIOExceptions > --- > > Key: HADOOP-19203 > URL: https://issues.apache.org/jira/browse/HADOOP-19203 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 3.4.1 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > It's easier to invoke methods through reflection through parquet/iceberg > DynMethods if the invoked method raises unchecked exceptions, because it > doesn't then rewrape the raised exception in a generic RuntimeException > Catching the IOEs and wrapping as UncheckedIOEs makes it much easier to > unwrap IOEs after the invocation -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19208) ABFS: Fixing logic to determine HNS nature of account to avoid extra getAcl() calls
Anuj Modi created HADOOP-19208: -- Summary: ABFS: Fixing logic to determine HNS nature of account to avoid extra getAcl() calls Key: HADOOP-19208 URL: https://issues.apache.org/jira/browse/HADOOP-19208 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Affects Versions: 3.4.0 Reporter: Anuj Modi Fix For: 3.5.0, 3.4.1 ABFS driver needs to know the type of account being used. It relies on the user to inform the account type using the config `fs.azure.account.hns.enabled`. If not configured, driver makes a getAcl call to determine the account type. Expectation is getAcl() will fail with 400 Bad Request if made on the FNS Account. For any other case including 200, 404 it will indicate account is HNS. Today, when determining this, the logic only checks status code to be either 200 or 400. In case of 404, nothing is inferred, and this leads to more getAcl again again untill 200 or 400 comes. Fix is to update the logic such that if getAl() fails with 400, it is FNS account. For all other cases it will be an HNS account. In case of throttling, if all retries are exhausted, FS init itself will fail. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19207) ABFS: [FnsOverBlob]Response Handling of Blob Endpoint APIs and Metadata APIs
Anuj Modi created HADOOP-19207: -- Summary: ABFS: [FnsOverBlob]Response Handling of Blob Endpoint APIs and Metadata APIs Key: HADOOP-19207 URL: https://issues.apache.org/jira/browse/HADOOP-19207 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Affects Versions: 3.4.0 Reporter: Anuj Modi Fix For: 3.5.0, 3.4.1 Blob Endpoint APIs has a different format for response than DFS Endpoint APIs. There are some behavioral differences as well that need to be handled at client side. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19206) Hadoop release contains a 530MB bundle-2.23.19.jar
Tsz-wo Sze created HADOOP-19206: --- Summary: Hadoop release contains a 530MB bundle-2.23.19.jar Key: HADOOP-19206 URL: https://issues.apache.org/jira/browse/HADOOP-19206 Project: Hadoop Common Issue Type: Improvement Components: build Reporter: Tsz-wo Sze The size of Hadoop binary release (v3.4.0) is 1.7 GB. {code:java} hadoop-3.4.0$du -h -d 1 $du -h -d 1 . 2.0M./bin 260K./libexec 72K./include 212K./sbin 184K./etc 232K./licenses-binary 316M./lib 1.4G./share 1.7G. {code} A large component is bundle-2.23.19.jar, which is [AWS Java SDK :: Bundle|https://mvnrepository.com/artifact/software.amazon.awssdk/bundle/2.23.19] {code:java} hadoop-3.4.0$ls -lh share/hadoop/tools/lib/bundle-2.23.19.jar -rw-r--r--@ 1 szetszwo staff 530M Mar 4 15:41 share/hadoop/tools/lib/bundle-2.23.19.jar {code} We should revisit if such a large jar is really needed to be included in the release. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18508) support multiple s3a integration test runs on same bucket in parallel
[ https://issues.apache.org/jira/browse/HADOOP-18508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-18508. - Fix Version/s: 3.5.0 3.4.1 Resolution: Fixed > support multiple s3a integration test runs on same bucket in parallel > - > > Key: HADOOP-18508 > URL: https://issues.apache.org/jira/browse/HADOOP-18508 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, test >Affects Versions: 3.3.9 >Reporter: Steve Loughran >Assignee: Steve Loughran >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > to have (internal, sorry) jenkins test runs work in parallel, they need to > share the same bucket so > # must have a prefix for job id which is passed in to the path used for forks > # support disabling root tests so they don't stamp on each other -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18931) FileSystem.getFileSystemClass() to log at debug the jar the .class came from
[ https://issues.apache.org/jira/browse/HADOOP-18931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-18931. - Fix Version/s: 3.5.0 3.4.1 Assignee: Viraj Jasani Resolution: Fixed > FileSystem.getFileSystemClass() to log at debug the jar the .class came from > > > Key: HADOOP-18931 > URL: https://issues.apache.org/jira/browse/HADOOP-18931 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 3.3.6 >Reporter: Steve Loughran >Assignee: Viraj Jasani >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > we want to be able to log the jar the filesystem implementation class, so > that we can identify which version of a module the class came from. > this is to help track down problems where different machines in the cluster > or the .tar.gz bundle is out of date. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19192) Log level is WARN when fail to load native hadoop libs
[ https://issues.apache.org/jira/browse/HADOOP-19192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19192. - Fix Version/s: 3.5.0 3.4.1 Assignee: Cheng Pan Resolution: Fixed > Log level is WARN when fail to load native hadoop libs > -- > > Key: HADOOP-19192 > URL: https://issues.apache.org/jira/browse/HADOOP-19192 > Project: Hadoop Common > Issue Type: Improvement > Components: documentation >Affects Versions: 3.3.6 >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19205) S3A initialization/close slower than with v1 SDK
Steve Loughran created HADOOP-19205: --- Summary: S3A initialization/close slower than with v1 SDK Key: HADOOP-19205 URL: https://issues.apache.org/jira/browse/HADOOP-19205 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Steve Loughran Hive QE have observed slowdown in LLAP queries due to time to create and close s3a filesystems instances. A key aspect of that is they keep closing the fs instances (HIVE-27884), but looking at the profiles, the reason things seem to have regressed is * two s3 clients are being created (sync and async) * these seem to take a lot of time scanning the classpath for "global interceptors", which is at least an O(jars) operation; #of index entries in the zip files may factor too. Proposed: * create async client on demand when the transfer manager is invoked * look at why passwords are being scanned for if InstanceProfileCredentialsProvider is in use...that seems slow too SDK wishes * SDK maybe allow us to turn off that scan for interceptors? attaching screenshots of the profile. storediag snippet: {code} [001] fs.s3a.access.key = (unset) [002] fs.s3a.secret.key = (unset) [003] fs.s3a.session.token = (unset) [004] fs.s3a.server-side-encryption-algorithm = (unset) [005] fs.s3a.server-side-encryption.key = (unset) [006] fs.s3a.encryption.algorithm = (unset) [007] fs.s3a.encryption.key = (unset) [008] fs.s3a.aws.credentials.provider = "com.amazonaws.auth.InstanceProfileCredentialsProvider" [core-site.xml] {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19204) VectorIO regression: empty ranges are now rejected
Steve Loughran created HADOOP-19204: --- Summary: VectorIO regression: empty ranges are now rejected Key: HADOOP-19204 URL: https://issues.apache.org/jira/browse/HADOOP-19204 Project: Hadoop Common Issue Type: Sub-task Components: fs Affects Versions: 3.4.1 Reporter: Steve Loughran Assignee: Steve Loughran The validation now rejects a readvectored with an empty range, whereas before it was a no-op Proposed fix, return the empty list; add test -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19203) WrappedIO BulkDelete API to raise iOEs as UncheckediOExceptions
Steve Loughran created HADOOP-19203: --- Summary: WrappedIO BulkDelete API to raise iOEs as UncheckediOExceptions Key: HADOOP-19203 URL: https://issues.apache.org/jira/browse/HADOOP-19203 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 3.4.1 Reporter: Steve Loughran It's easier to invoke methods through reflection through parquet/iceberg DynMethods if the invoked method raises unchecked exceptions, because it doesn't then rewrape the raised exception in a generic RuntimeException Catching the IOEs and wrapping as UncheckedIOEs makes it much easier to unwrap IOEs after the invocation -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19196) Bulk delete api doesn't take the path to delete as the base path
[ https://issues.apache.org/jira/browse/HADOOP-19196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-19196. Fix Version/s: 3.5.0 3.4.1 Resolution: Fixed > Bulk delete api doesn't take the path to delete as the base path > > > Key: HADOOP-19196 > URL: https://issues.apache.org/jira/browse/HADOOP-19196 > Project: Hadoop Common > Issue Type: Bug > Components: fs >Affects Versions: 3.5.0, 3.4.1 >Reporter: Steve Loughran >Assignee: Mukund Thakur >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > If you use the path of the file you intend to delete as the base path, you > get an error. This is because the validation requires the list to be of > children, but the base path itself should be valid. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19137) [ABFS]Prevent ABFS initialization for non-hierarchal-namespace account if Customer-provided-key configs given.
[ https://issues.apache.org/jira/browse/HADOOP-19137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-19137. Resolution: Fixed > [ABFS]Prevent ABFS initialization for non-hierarchal-namespace account if > Customer-provided-key configs given. > -- > > Key: HADOOP-19137 > URL: https://issues.apache.org/jira/browse/HADOOP-19137 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.4.0 >Reporter: Pranav Saxena >Assignee: Pranav Saxena >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > Store doesn't flow in the namespace information to the client. > In https://github.com/apache/hadoop/pull/6221, getIsNamespaceEnabled is added > in client methods which checks if namespace information is there or not, and > if not there, it will make getAcl call and set the field. Once the field is > set, it would be used in future getIsNamespaceEnabled method calls for a > given AbfsClient. > Since, CPK both global and encryptionContext are only for hns account, the > fix that is proposed is that we would fail fs init if its non-hns account and > cpk config is given. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19202) EC: Support decommissioning DataNode by EC block reconstruction
Chenyu Zheng created HADOOP-19202: - Summary: EC: Support decommissioning DataNode by EC block reconstruction Key: HADOOP-19202 URL: https://issues.apache.org/jira/browse/HADOOP-19202 Project: Hadoop Common Issue Type: Improvement Reporter: Chenyu Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19199) Include FileStatus when opening a file from FileSystem
[ https://issues.apache.org/jira/browse/HADOOP-19199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19199. - Resolution: Duplicate Closing as a duplicate of HADOOP-15229. I absolutely agree the head request are needless. Which is why we added exactly the feature you wanted in 2019, *five years ago*. And in HADOOP-16202, you only need to pass in the file length, so if you can store that in your manifests, then you can skip the HEAD call (s3a; abfs still needs it). The problem we have is therefore not that Hadoop library lacks this, it is that libraries and applications haven't taken it up. Why not? Because they want compile against versions of duke that are over 10 years old. Which means that all improvements we have done that are wasted. Although private forks can do this, it's a very hard to get this taken up consistently, and people like you and I suffer in wasted time and money. What can be done? Well, I have concluded that trying to get the projects upgrade doesn't work, and waiting for the libraries to "get up-to-date" is a moving target as we are always trying to improve in this area. Instead, all our new work is being targeted at being "reflection-friendly" and expecting the initial take-up to be through reflection. In HADOOP-19131 I am exporting the existing openFile() API (which takes a builder and returns and asynchronously evaluated input stream) as an easy-to-reflect function {code} public static FSDataInputStream fileSystem_openFile( final FileSystem fs, final Path path, final String policy, final FileStatus status, final Long length, final Map options) throws IOException { {code} The "policy" is also critical as it tells the storage layer what access policy you want, such as random or sequential. I'm going to add an explicit "parquet" policy here too, which hence to the library that footer caching would be good. What can you do then? Other than just waiting for this to happen? Help us get this through the stack. We need it in: parquet, iceberg, spark, avro. Can you start by reviewing HADOOP-19131 and seeing how well you think it will integrate *and anything you can do in terms of Proof of Concept PRs using this patch*, so we can identify problems before the hadoop patch is merged. > Include FileStatus when opening a file from FileSystem > -- > > Key: HADOOP-19199 > URL: https://issues.apache.org/jira/browse/HADOOP-19199 > Project: Hadoop Common > Issue Type: Improvement > Components: fs >Affects Versions: 3.4.0 >Reporter: Oliver Caballero Alvarez >Priority: Major > Labels: pull-request-available > > The FileSystem abstract class prevents that if you have information about the > FileStatus of a file, you use it to open that file, which means that in the > implementations of the open method, they have to request the FileStatus of > the same file again, making unnecessary requests. > A very clear example is seen in today's latest version of the parquet-hadoop > implementation, where: > https://github.com/apache/parquet-java/blob/apache-parquet-1.14.0/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopInputFile.java > Although to create the implementation you had to consult the file to know its > FileStatus, when opening it only the path is included, since the FileSystem > implementation is the only thing it allows you to do. This implies that the > implementation will surely, in its open function, verify that the file exists > or what information the file has and perform the same operation again to > collect the FileStatus. > > This would simply be resolved by taking the latest current version: > > [https://github.com/apache/hadoop/blob/release-3.4.0-RC3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java] > and including the following: > > public FSDataInputStream open(FileStatus f) throws IOException { > return this.open(f.getPath(), > this.getConf().getInt("io.file.buffer.size", 4096)); > } > > This would imply that it is backward compatible with all current Filesystems, > but since it is in the implementation it could be used when this information > is already known. > > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19200) Reduce the number of headObject when opening a file with the s3 file system
[ https://issues.apache.org/jira/browse/HADOOP-19200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19200. - Resolution: Duplicate > Reduce the number of headObject when opening a file with the s3 file system > --- > > Key: HADOOP-19200 > URL: https://issues.apache.org/jira/browse/HADOOP-19200 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 >Affects Versions: 3.4.0, 3.3.6 >Reporter: Oliver Caballero Alvarez >Priority: Major > > In the implementation of the S3 filesystem, of the hadoop aws package, if you > use it with spark, every time you open a file for anything you will have to > send two Head Objects, since to open the file, you will first look to see if > this file exists, executing a HeadObject, and then when opening it, the > implementation, both of sdk1 and sdk2, forces you to make a head object > again. This is not the fault of the implementation of this class > (S3AFileSystem), but of the abstract FileSystem class of the Hadoop core, > since it does not allow the FileStatus to be passed but only allows the use > of Path. > If the FileSystem implementation is changed, it could be used to not have to > request that HeadObject again. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19201) Support external id in assume role
Smith Cruise created HADOOP-19201: - Summary: Support external id in assume role Key: HADOOP-19201 URL: https://issues.apache.org/jira/browse/HADOOP-19201 Project: Hadoop Common Issue Type: Improvement Components: fs/s3 Affects Versions: 3.4.0 Reporter: Smith Cruise Fix For: 3.4.1 Support external id in AssumedRoleCredentialProvider.java -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19200) Reduce the number of headObject when opening a file with the s3 file system
Oliver Caballero Alvarez created HADOOP-19200: - Summary: Reduce the number of headObject when opening a file with the s3 file system Key: HADOOP-19200 URL: https://issues.apache.org/jira/browse/HADOOP-19200 Project: Hadoop Common Issue Type: Improvement Components: fs/s3 Affects Versions: 3.3.6, 3.4.0 Reporter: Oliver Caballero Alvarez In the implementation of the S3 filesystem, of the hadoop aws package, if you use it with spark, every time you open a file for anything you will have to send two Head Objects, since to open the file, you will first look to see if this file exists, executing a HeadObject, and then when opening it, the implementation, both of sdk1 and sdk2, forces you to make a head object again. This is not the fault of the implementation of this class (S3AFileSystem), but of the abstract FileSystem class of the Hadoop core, since it does not allow the FileStatus to be passed but only allows the use of Path. If the FileSystem implementation is changed, it could be used to not have to request that HeadObject again. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19199) Include FileStatus when opening a file from FileSystem
Oliver Caballero Alvarez created HADOOP-19199: - Summary: Include FileStatus when opening a file from FileSystem Key: HADOOP-19199 URL: https://issues.apache.org/jira/browse/HADOOP-19199 Project: Hadoop Common Issue Type: Improvement Components: fs Affects Versions: 3.4.0 Reporter: Oliver Caballero Alvarez The FileSystem abstract class prevents that if you have information about the FileStatus of a file, you use it to open that file, which means that in the implementations of the open method, they have to request the FileStatus of the same file again, making unnecessary requests. A very clear example is seen in today's latest version of the parquet-hadoop implementation, where: https://github.com/apache/parquet-java/blob/apache-parquet-1.14.0/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopInputFile.java Although to create the implementation you had to consult the file to know its FileStatus, when opening it only the path is included, since the FileSystem implementation is the only thing it allows you to do. This implies that the implementation will surely, in its open function, verify that the file exists or what information the file has and perform the same operation again to collect the FileStatus. This would simply be resolved by taking the latest current version: [https://github.com/apache/hadoop/blob/release-3.4.0-RC3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java] and including the following: public FSDataInputStream open(FileStatus f) throws IOException { return this.open(f.getPath(), this.getConf().getInt("io.file.buffer.size", 4096)); } This would imply that it is backward compatible with all current Filesystems, but since it is in the implementation it could be used when this information is already known. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18516) [ABFS]: Support fixed SAS token config in addition to Custom SASTokenProvider Implementation
[ https://issues.apache.org/jira/browse/HADOOP-18516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-18516. - Fix Version/s: 3.4.1 Resolution: Fixed > [ABFS]: Support fixed SAS token config in addition to Custom SASTokenProvider > Implementation > > > Key: HADOOP-18516 > URL: https://issues.apache.org/jira/browse/HADOOP-18516 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.4.0 >Reporter: Sree Bhattacharyya >Assignee: Anuj Modi >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > This PR introduces a new configuration for Fixed SAS Tokens: > *"fs.azure.sas.fixed.token"* > Using this new configuration, users can configure a fixed SAS Token in the > account settings files itself. Ideally, this should be used with SAS Tokens > that are scoped at a container or account level (Service or Account SAS), > which can be considered to be a constant for one account or container, over > multiple operations. > The other method of using a SAS Token remains valid as well, where a user > provides a custom implementation of the SASTokenProvider interface, using > which a SAS Token are obtained. > When an Account SAS Token is configured as the fixed SAS Token, and it is > used, it is ensured that operations are within the scope of the SAS Token. > The code checks for whether the fixed token and the token provider class > implementation are configured. In the case of both being set, preference is > given to the custom SASTokenProvider implementation. It must be noted that if > such an implementation provides a SAS Token which has a lower scope than > Account SAS, some filesystem and service level operations might be out of > scope and may not succeed. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19178) WASB Driver Deprecation and eventual removal
[ https://issues.apache.org/jira/browse/HADOOP-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19178. - Fix Version/s: 3.3.9 3.5.0 Assignee: Anuj Modi (was: Sneha Vijayarajan) Resolution: Fixed > WASB Driver Deprecation and eventual removal > > > Key: HADOOP-19178 > URL: https://issues.apache.org/jira/browse/HADOOP-19178 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.4.0 >Reporter: Sneha Vijayarajan >Assignee: Anuj Modi >Priority: Major > Labels: pull-request-available > Fix For: 3.3.9, 3.5.0, 3.4.1 > > > *WASB Driver* > WASB driver was developed to support FNS (FlatNameSpace) Azure Storage > accounts. FNS accounts do not honor File-Folder syntax. HDFS Folder > operations hence are mimicked at client side by WASB driver and certain > folder operations like Rename and Delete can lead to lot of IOPs with > client-side enumeration and orchestration of rename/delete operation blob by > blob. It was not ideal for other APIs too as initial checks for path is a > file or folder needs to be done over multiple metadata calls. These led to a > degraded performance. > To provide better service to Analytics customers, Microsoft released ADLS > Gen2 which are HNS (Hierarchical Namespace) , i.e File-Folder aware store. > ABFS driver was designed to overcome the inherent deficiencies of WASB and > customers were informed to migrate to ABFS driver. > *Customers who still use the legacy WASB driver and the challenges they face* > Some of our customers have not migrated to the ABFS driver yet and continue > to use the legacy WASB driver with FNS accounts. > These customers face the following challenges: > * They cannot leverage the optimizations and benefits of the ABFS driver. > * They need to deal with the compatibility issues should the files and > folders were modified with the legacy WASB driver and the ABFS driver > concurrently in a phased transition situation. > * There are differences for supported features for FNS and HNS over ABFS > Driver > * In certain cases, they must perform a significant amount of re-work on > their workloads to migrate to the ABFS driver, which is available only on HNS > enabled accounts in a fully tested and supported scenario. > *Deprecation plans for WASB* > We are introducing a new feature that will enable the ABFS driver to support > FNS accounts (over BlobEndpoint) using the ABFS scheme. This feature will > enable customers to use the ABFS driver to interact with data stored in GPv2 > (General Purpose v2) storage accounts. > With this feature, the customers who still use the legacy WASB driver will be > able to migrate to the ABFS driver without much re-work on their workloads. > They will however need to change the URIs from the WASB scheme to the ABFS > scheme. > Once ABFS driver has built FNS support capability to migrate WASB customers, > WASB driver will be declared deprecated in OSS documentation and marked for > removal in next major release. This will remove any ambiguity for new > customer onboards as there will be only one Microsoft driver for Azure > Storage and migrating customers will get SLA bound support for driver and > service, which was not guaranteed over WASB. > We anticipate that this feature will serve as a stepping stone for customers > to move to HNS enabled accounts with the ABFS driver, which is our > recommended stack for big data analytics on ADLS Gen2. > *Any Impact for* *existing customers who are using ADLS Gen2 (HNS enabled > account) with ABFS driver* *?* > This feature does not impact the existing customers who are using ADLS Gen2 > (HNS enabled account) with ABFS driver. > They do not need to make any changes to their workloads or configurations. > They will still enjoy the benefits of HNS, such as atomic operations, > fine-grained access control, scalability, and performance. > *Official recommendation* > Microsoft continues to recommend all Big Data and Analytics customers to use > Azure Data Lake Gen2 (ADLS Gen2) using the ABFS driver and will continue to > optimize this scenario in future, we believe that this new option will help > all those customers to transition to a supported scenario immediately, while > they plan to ultimately move to ADLS Gen2 (HNS enabled account). > *New Authentication options that a WASB to ABFS Driver migrating customer > will get* > Below auth types that WASB provides will continue
[jira] [Resolved] (HADOOP-19114) upgrade to commons-compress 1.26.1 due to cves
[ https://issues.apache.org/jira/browse/HADOOP-19114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19114. - Fix Version/s: 3.5.0 3.4.1 Resolution: Fixed > upgrade to commons-compress 1.26.1 due to cves > -- > > Key: HADOOP-19114 > URL: https://issues.apache.org/jira/browse/HADOOP-19114 > Project: Hadoop Common > Issue Type: Bug > Components: build, CVE >Affects Versions: 3.4.0 >Reporter: PJ Fanning >Assignee: PJ Fanning >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > 2 recent CVEs fixed - > https://mvnrepository.com/artifact/org.apache.commons/commons-compress > Important: Denial of Service CVE-2024-25710 > Moderate: Denial of Service CVE-2024-26308 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19197) S3A: Support AWS KMS Encryption Context
Raphael Azzolini created HADOOP-19197: - Summary: S3A: Support AWS KMS Encryption Context Key: HADOOP-19197 URL: https://issues.apache.org/jira/browse/HADOOP-19197 Project: Hadoop Common Issue Type: New Feature Components: fs/s3 Affects Versions: 3.4.0 Reporter: Raphael Azzolini S3A properties allow users to choose the AWS KMS key ({_}fs.s3a.encryption.key{_}) and S3 encryption algorithm to be used (f{_}s.s3a.encryption.algorithm{_}). In addition to the AWS KMS Key, an encryption context can be used as non-secret data that adds additional integrity and authenticity to check the encrypted data. However, there is no option to specify the [AWS KMS Encryption Context|https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#encrypt_context] in S3A. In AWS SDK v2 the encryption context in S3 requests is set by the parameter [ssekmsEncryptionContext.|https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/model/CreateMultipartUploadRequest.Builder.html#ssekmsEncryptionContext(java.lang.String)] It receives a base64-encoded UTF-8 string holding JSON with the encryption context key-value pairs. The value of this parameter could be set by the user in a new property {_}*fs.s3a.encryption.context*{_}, and be stored in the [EncryptionSecrets|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/auth/delegation/EncryptionSecrets.java] to later be used when setting the encryption parameters in [RequestFactoryImpl|https://github.com/apache/hadoop/blob/f92a8ab8ae54f11946412904973eb60404dee7ff/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RequestFactoryImpl.java]. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19196) Bulk delete api doesn't take the path to delete as the base path
Steve Loughran created HADOOP-19196: --- Summary: Bulk delete api doesn't take the path to delete as the base path Key: HADOOP-19196 URL: https://issues.apache.org/jira/browse/HADOOP-19196 Project: Hadoop Common Issue Type: Bug Components: fs Affects Versions: 3.5.0, 3.4.1 Reporter: Steve Loughran If you use the path of the file you intend to delete as the base path, you get an error. This is because the validation requires the list to be of children, but the base path itself should be valid. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19195) Upgrade aws sdk v2 to 2.25.53
Harshit Gupta created HADOOP-19195: -- Summary: Upgrade aws sdk v2 to 2.25.53 Key: HADOOP-19195 URL: https://issues.apache.org/jira/browse/HADOOP-19195 Project: Hadoop Common Issue Type: Improvement Components: fs/s3 Affects Versions: 3.5.0, 3.4.1 Reporter: Harshit Gupta Assignee: Harshit Gupta Fix For: 3.5.0 Upgrade aws sdk v2 to 2.25.53 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19194) Add test to find unshaded dependencies in the aws sdk
Harshit Gupta created HADOOP-19194: -- Summary: Add test to find unshaded dependencies in the aws sdk Key: HADOOP-19194 URL: https://issues.apache.org/jira/browse/HADOOP-19194 Project: Hadoop Common Issue Type: Improvement Components: fs/s3 Affects Versions: 3.4.0 Reporter: Harshit Gupta Assignee: Harshit Gupta Fix For: 3.4.1 Write a test to assess the aws sdk for unshaded artefacts on the class path which might cause deployment failures. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19193) Create orphan commit for website deployment
[ https://issues.apache.org/jira/browse/HADOOP-19193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19193. - Fix Version/s: 3.5.0 Resolution: Fixed > Create orphan commit for website deployment > --- > > Key: HADOOP-19193 > URL: https://issues.apache.org/jira/browse/HADOOP-19193 > Project: Hadoop Common > Issue Type: Improvement > Components: documentation >Reporter: Cheng Pan >Assignee: Cheng Pan >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19193) Create orphan commit for website deployment
Cheng Pan created HADOOP-19193: -- Summary: Create orphan commit for website deployment Key: HADOOP-19193 URL: https://issues.apache.org/jira/browse/HADOOP-19193 Project: Hadoop Common Issue Type: Improvement Components: documentation Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19192) Log level is WARN when fail to load native hadoop libs
Cheng Pan created HADOOP-19192: -- Summary: Log level is WARN when fail to load native hadoop libs Key: HADOOP-19192 URL: https://issues.apache.org/jira/browse/HADOOP-19192 Project: Hadoop Common Issue Type: Improvement Components: documentation Affects Versions: 3.3.6 Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19188) TestHarFileSystem and TestFilterFileSystem failing after bulk delete API added
[ https://issues.apache.org/jira/browse/HADOOP-19188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19188. - Resolution: Fixed > TestHarFileSystem and TestFilterFileSystem failing after bulk delete API added > -- > > Key: HADOOP-19188 > URL: https://issues.apache.org/jira/browse/HADOOP-19188 > Project: Hadoop Common > Issue Type: Bug > Components: fs, test >Affects Versions: 3.5.0, 3.4.1 >Reporter: Steve Loughran >Assignee: Mukund Thakur >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.0, 3.4.1 > > > oh, we need to update a couple of tests so they know not to worry about the > new interface/method. The details are in the javadocs of FileSystem. > Interesting these snuck through yetus, though they fail in PRs based atop > #6726 > {code} > [ERROR] Failures: > [ERROR] org.apache.hadoop.fs.TestFilterFileSystem.testFilterFileSystem > [ERROR] Run 1: TestFilterFileSystem.testFilterFileSystem:181 1 methods were > not overridden correctly - see log > [ERROR] Run 2: TestFilterFileSystem.testFilterFileSystem:181 1 methods were > not overridden correctly - see log > [ERROR] Run 3: TestFilterFileSystem.testFilterFileSystem:181 1 methods were > not overridden correctly - see log > [INFO] > [ERROR] org.apache.hadoop.fs.TestHarFileSystem.testInheritedMethodsImplemented > [ERROR] Run 1: TestHarFileSystem.testInheritedMethodsImplemented:402 1 > methods were not overridden correctly - see log > [ERROR] Run 2: TestHarFileSystem.testInheritedMethodsImplemented:402 1 > methods were not overridden correctly - see log > [ERROR] Run 3: TestHarFileSystem.testInheritedMethodsImplemented:402 1 > methods were not overridden correctly - see log > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19191) Batch APIs for delete
[ https://issues.apache.org/jira/browse/HADOOP-19191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19191. - Resolution: Duplicate Fixed in HADOOP-18679; there's an iceberg PR up to use the reflection-friendly WrappedIO access point. That feature will ship in hadoop 3.4.1; i would like a basic backport to branch-3.3 where even though the full s3a-side backport would be impossible (sdk versions...), we could at least offer the public API to all and the page-size=1 DELETE call for S3, *without any safety checks*. it'll still save some LIST calls and encourage adoption. If you want to get involved there, happy to take PRs (under the original JIRA) > Batch APIs for delete > - > > Key: HADOOP-19191 > URL: https://issues.apache.org/jira/browse/HADOOP-19191 > Project: Hadoop Common > Issue Type: New Feature > Components: fs >Reporter: Alkis Evlogimenos >Priority: Major > > Add batch APIs with for delete to allow better performance for object stores: > {{boolean[] delete(Path[] paths);}} > The API should have a default implementation that delegates to the singular > delete. Implementations can override to provide better performance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19191) Batch APIs for delete
Alkis Evlogimenos created HADOOP-19191: -- Summary: Batch APIs for delete Key: HADOOP-19191 URL: https://issues.apache.org/jira/browse/HADOOP-19191 Project: Hadoop Common Issue Type: Bug Components: fs Reporter: Alkis Evlogimenos Add batch APIs with for delete to allow better performance for object stores: {{boolean[] delete(Path[] paths);}} The API should have a default implementation that delegates to the singular delete. Implementations can override to provide better performance. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19190) Skip ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes when bucket not encrypted with sse-kms
[ https://issues.apache.org/jira/browse/HADOOP-19190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-19190. Resolution: Fixed > Skip ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes > when bucket not encrypted with sse-kms > > > Key: HADOOP-19190 > URL: https://issues.apache.org/jira/browse/HADOOP-19190 > Project: Hadoop Common > Issue Type: Test > Components: fs/s3 >Affects Versions: 3.4.1 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Minor > Labels: pull-request-available > Fix For: 3.4.1 > > > [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 4, Time elapsed: 12.80 > s <<< FAILURE! -- in > org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings > [ERROR] > org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes > -- Time elapsed: 5.065 s <<< FAILURE! > org.junit.ComparisonFailure: [Server side encryption algorithm must match] > expected:<"[aws:kms]"> but was:<"[AES256]"> > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native > Method) > at > sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) > at > sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) > at > org.apache.hadoop.fs.s3a.EncryptionTestUtils.validateEncryptionFileAttributes(EncryptionTestUtils.java:138) > at > org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes(ITestS3AEncryptionWithDefaultS3Settings.java:112) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19190) Skip ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes when bucket not encrypted with sse-kms
Mukund Thakur created HADOOP-19190: -- Summary: Skip ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes when bucket not encrypted with sse-kms Key: HADOOP-19190 URL: https://issues.apache.org/jira/browse/HADOOP-19190 Project: Hadoop Common Issue Type: Test Components: fs/s3 Affects Versions: 3.4.1 Reporter: Mukund Thakur Assignee: Mukund Thakur [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 4, Time elapsed: 12.80 s <<< FAILURE! -- in org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings [ERROR] org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes -- Time elapsed: 5.065 s <<< FAILURE! org.junit.ComparisonFailure: [Server side encryption algorithm must match] expected:<"[aws:kms]"> but was:<"[AES256]"> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at org.apache.hadoop.fs.s3a.EncryptionTestUtils.validateEncryptionFileAttributes(EncryptionTestUtils.java:138) at org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes(ITestS3AEncryptionWithDefaultS3Settings.java:112) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59) -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19189) ITestS3ACommitterFactory failing
Steve Loughran created HADOOP-19189: --- Summary: ITestS3ACommitterFactory failing Key: HADOOP-19189 URL: https://issues.apache.org/jira/browse/HADOOP-19189 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3, test Affects Versions: 3.4.0 Reporter: Steve Loughran we've had ITestS3ACommitterFactory failing for a while, where it looks like changed committer settings aren't being picked up. {code} ERROR] ITestS3ACommitterFactory.testEverything:115->testInvalidFileBinding:165 Expected a org.apache.hadoop.fs.s3a.commit.PathCommitException to be thrown, but got the result: : FileOutputCommitter{PathOutputCommitter{context=TaskAttemptContextImpl{JobContextImpl {code} I've spent some time looking at it and it is happening because the test sets the fileystem ref for the local test fs, and not that of the filesystem created by the committer, which is where the option is picked up. i've tried to parameterize it but things are still playing up and I'm not sure how hard to try to fix. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19156) ZooKeeper based state stores use different ZK address configs
[ https://issues.apache.org/jira/browse/HADOOP-19156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiaoqiao He resolved HADOOP-19156. -- Fix Version/s: 3.5.0 Hadoop Flags: Reviewed Resolution: Fixed > ZooKeeper based state stores use different ZK address configs > - > > Key: HADOOP-19156 > URL: https://issues.apache.org/jira/browse/HADOOP-19156 > Project: Hadoop Common > Issue Type: Improvement >Reporter: liu bin >Assignee: liu bin >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > Currently, the Zookeeper-based state stores of RM, YARN Federation, and HDFS > Federation use the same ZK address config `{{{}hadoop.zk.address`{}}}. But in > our production environment, we hope that different services can use different > ZKs to avoid mutual influence. > This jira adds separate ZK address configs for each service. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18679) Add API for bulk/paged delete of files and objects
[ https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-18679. Resolution: Fixed > Add API for bulk/paged delete of files and objects > -- > > Key: HADOOP-18679 > URL: https://issues.apache.org/jira/browse/HADOOP-18679 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Affects Versions: 3.3.5 >Reporter: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 3.4.1 > > > iceberg and hbase could benefit from being able to give a list of individual > files to delete -files which may be scattered round the bucket for better > read peformance. > Add some new optional interface for an object store which allows a caller to > submit a list of paths to files to delete, where > the expectation is > * if a path is a file: delete > * if a path is a dir, outcome undefined > For s3 that'd let us build these into DeleteRequest objects, and submit, > without any probes first. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19184) TestStagingCommitter.testJobCommitFailure failing
[ https://issues.apache.org/jira/browse/HADOOP-19184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mukund Thakur resolved HADOOP-19184. Fix Version/s: 3.4.1 Resolution: Fixed > TestStagingCommitter.testJobCommitFailure failing > -- > > Key: HADOOP-19184 > URL: https://issues.apache.org/jira/browse/HADOOP-19184 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3 >Reporter: Mukund Thakur >Assignee: Mukund Thakur >Priority: Critical > Labels: pull-request-available > Fix For: 3.4.1 > > > {code:java} > [INFO] > [ERROR] Failures: > [ERROR] TestStagingCommitter.testJobCommitFailure:662 [Committed objects > compared to deleted paths > org.apache.hadoop.fs.s3a.commit.staging.StagingTestBase$ClientResults@1b4ab85{ > requests=12, uploads=12, parts=12, tagsByUpload=12, commits=5, aborts=7, > deletes=0}] > Expecting: > > <["s3a://bucket-name/output/path/r_0_0_0e1f4790-4d3f-4abb-ba98-2b39ec8b7566", > > "s3a://bucket-name/output/path/r_0_0_92306fea-0219-4ba5-a2b6-091d95547c11", > > "s3a://bucket-name/output/path/r_1_1_016c4a25-a1f7-4e01-918e-e24a32c7525f", > > "s3a://bucket-name/output/path/r_0_0_b2698dab-5870-4bdb-98ab-0ef5832eca45", > > "s3a://bucket-name/output/path/r_1_1_600b7e65-a7ff-4d07-b763-c4339a9164ad"]> > to contain exactly in any order: > <[]> > but the following elements were unexpected: > > <["s3a://bucket-name/output/path/r_0_0_0e1f4790-4d3f-4abb-ba98-2b39ec8b7566", > > "s3a://bucket-name/output/path/r_0_0_92306fea-0219-4ba5-a2b6-091d95547c11", > > "s3a://bucket-name/output/path/r_1_1_016c4a25-a1f7-4e01-918e-e24a32c7525f", > > "s3a://bucket-name/output/path/r_0_0_b2698dab-5870-4bdb-98ab-0ef5832eca45",{code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19188) TestHarFileSystem and TestFilterFileSystem failing after bulk delete API added
Steve Loughran created HADOOP-19188: --- Summary: TestHarFileSystem and TestFilterFileSystem failing after bulk delete API added Key: HADOOP-19188 URL: https://issues.apache.org/jira/browse/HADOOP-19188 Project: Hadoop Common Issue Type: Bug Components: fs, test Affects Versions: 3.5.0 Reporter: Steve Loughran Assignee: Mukund Thakur oh, we need to update a couple of tests so they know not to worry about the new interface/method. The details are in the javadocs of FileSystem. Interesting these snuck through yetus, though they fail in PRs based atop #6726 {code} [ERROR] Failures: [ERROR] org.apache.hadoop.fs.TestFilterFileSystem.testFilterFileSystem [ERROR] Run 1: TestFilterFileSystem.testFilterFileSystem:181 1 methods were not overridden correctly - see log [ERROR] Run 2: TestFilterFileSystem.testFilterFileSystem:181 1 methods were not overridden correctly - see log [ERROR] Run 3: TestFilterFileSystem.testFilterFileSystem:181 1 methods were not overridden correctly - see log [INFO] [ERROR] org.apache.hadoop.fs.TestHarFileSystem.testInheritedMethodsImplemented [ERROR] Run 1: TestHarFileSystem.testInheritedMethodsImplemented:402 1 methods were not overridden correctly - see log [ERROR] Run 2: TestHarFileSystem.testInheritedMethodsImplemented:402 1 methods were not overridden correctly - see log [ERROR] Run 3: TestHarFileSystem.testInheritedMethodsImplemented:402 1 methods were not overridden correctly - see log {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19187) ABFS: Making AbfsClient Abstract for supporting both DFS and Blob Endpoint
Anuj Modi created HADOOP-19187: -- Summary: ABFS: Making AbfsClient Abstract for supporting both DFS and Blob Endpoint Key: HADOOP-19187 URL: https://issues.apache.org/jira/browse/HADOOP-19187 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Affects Versions: 3.4.0 Reporter: Anuj Modi Assignee: Anuj Modi Fix For: 3.5.0, 3.4.1 Azure Services support two different set of APIs. Blob: [https://learn.microsoft.com/en-us/rest/api/storageservices/blob-service-rest-api] DFS: [https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/operation-groups] As per the plan in HADOOP-19179, this task enables ABFS Driver to work with both set of APIs as per the requirement. Scope of this task is to refactor the ABfsClient so that ABFSStore can choose to interact with the client it wants based on the endpoint configured by user. The blob endpoint support will remain "Unsupported" until the whole code is checked-in and well tested. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18962) Upgrade kafka to 3.4.0
[ https://issues.apache.org/jira/browse/HADOOP-18962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-18962. - Fix Version/s: 3.5.0 Resolution: Fixed > Upgrade kafka to 3.4.0 > -- > > Key: HADOOP-18962 > URL: https://issues.apache.org/jira/browse/HADOOP-18962 > Project: Hadoop Common > Issue Type: Bug >Reporter: D M Murali Krishna Reddy >Assignee: D M Murali Krishna Reddy >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > Upgrade kafka-clients to 3.4.0 to fix > https://nvd.nist.gov/vuln/detail/CVE-2023-25194 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19186) Change loglevel to ERROR/WARNING so that it would easy to identify the problem without ignoring
Srinivasu Majeti created HADOOP-19186: - Summary: Change loglevel to ERROR/WARNING so that it would easy to identify the problem without ignoring Key: HADOOP-19186 URL: https://issues.apache.org/jira/browse/HADOOP-19186 Project: Hadoop Common Issue Type: Improvement Components: security Reporter: Srinivasu Majeti On the new Host with Java version 11, the DN was not able to communicate with the NN. We enabled DEBUG logging for the DN and the below message was logged under DEBUG level. DEBUG org.apache.hadoop.security.UserGroupInformation: PrivilegedActionException as:hdfs/av3l704p.bigdata.it.internal@PRODUCTION.LOCAL (auth:KERBEROS) cause:javax.security.sasl.SaslExcept ion: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Receive timed out)] Without a DEBUG level logging, this was shown up as a WARNING as below WARN org.apache.hadoop.ipc.Client: Couldn't setup connection for hdfs/av3l704p.bigdata.it.internal@PRODUCTION.LOCAL to avl2785p.bigdata.it.internal/172.24.178.32:8022 javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Receive timed out)] A considerable amount of time was spent troubleshooting this issue as this exception was moved to a DEBUG level which was difficult to track in the logs. Can we have such critical WARNINGs shown up at the WARN/ERROR level so that it's not missed when we enable DEBUG level logging for datanodes? -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19168) Upgrade Kafka Clients due to CVEs
[ https://issues.apache.org/jira/browse/HADOOP-19168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19168. - Resolution: Duplicate rohit, dupe of HADOOP-18962. let's focus on that > Upgrade Kafka Clients due to CVEs > - > > Key: HADOOP-19168 > URL: https://issues.apache.org/jira/browse/HADOOP-19168 > Project: Hadoop Common > Issue Type: Task >Reporter: Rohit Kumar >Priority: Major > Labels: pull-request-available > > Upgrade Kafka Clients due to CVEs > CVE-2023-25194:- Affected versions of this package are vulnerable to > Deserialization of Untrusted Data when there are gadgets in the > {{{}classpath{}}}. The server will connect to the attacker's LDAP server and > deserialize the LDAP response, which the attacker can use to execute java > deserialization gadget chains on the Kafka connect server. > CVSS Score:- 8.8(High) > [https://nvd.nist.gov/vuln/detail/CVE-2023-25194] > CVE-2021-38153 > CVE-2018-17196 > Insufficient Entropy > [https://security.snyk.io/package/maven/org.apache.kafka:kafka-clients] > Upgrade Kafka-Clients to 3.4.0 or higher. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19182) Upgrade kafka to 3.4.0
[ https://issues.apache.org/jira/browse/HADOOP-19182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19182. - Resolution: Duplicate > Upgrade kafka to 3.4.0 > -- > > Key: HADOOP-19182 > URL: https://issues.apache.org/jira/browse/HADOOP-19182 > Project: Hadoop Common > Issue Type: Bug > Components: build >Reporter: fuchaohong >Priority: Major > Labels: pull-request-available > > Upgrade kafka to 3.4.0 to resolve CVE-2023-25194 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19185) Improve ABFS metric integration with iOStatistics
Steve Loughran created HADOOP-19185: --- Summary: Improve ABFS metric integration with iOStatistics Key: HADOOP-19185 URL: https://issues.apache.org/jira/browse/HADOOP-19185 Project: Hadoop Common Issue Type: Sub-task Components: fs/azure Reporter: Steve Loughran Followup to HADOOP-18325 covering the outstanding comments of https://github.com/apache/hadoop/pull/6314/files -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-18325) ABFS: Add correlated metric support for ABFS operations
[ https://issues.apache.org/jira/browse/HADOOP-18325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-18325. - Fix Version/s: 3.5.0 Resolution: Fixed > ABFS: Add correlated metric support for ABFS operations > --- > > Key: HADOOP-18325 > URL: https://issues.apache.org/jira/browse/HADOOP-18325 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/azure >Affects Versions: 3.3.3 >Reporter: Anmol Asrani >Assignee: Anmol Asrani >Priority: Major > Labels: pull-request-available > Fix For: 3.5.0 > > > Add metrics related to a particular job, specific to number of total > requests, retried requests, retry count and others -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19184) TestStagingCommitter.testJobCommitFailure failing
Mukund Thakur created HADOOP-19184: -- Summary: TestStagingCommitter.testJobCommitFailure failing Key: HADOOP-19184 URL: https://issues.apache.org/jira/browse/HADOOP-19184 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Reporter: Mukund Thakur Assignee: Mukund Thakur [INFO] [ERROR] Failures: [ERROR] TestStagingCommitter.testJobCommitFailure:662 [Committed objects compared to deleted paths org.apache.hadoop.fs.s3a.commit.staging.StagingTestBase$ClientResults@2de1acf4\{ requests=12, uploads=12, parts=12, tagsByUpload=12, commits=5, aborts=7, deletes=0}] Expecting: <["s3a://bucket-name/output/path/r_0_0_c055250c-58c7-47ea-8b14-215cb5462e89", "s3a://bucket-name/output/path/r_1_1_9111aa65-96c2-465c-b278-696aff7707e3", "s3a://bucket-name/output/path/r_0_0_dec7f398-ee4e-4a53-a783-6b72cead569a", "s3a://bucket-name/output/path/r_1_1_39ad0eba-1053-4217-aa63-ddc8edfa7c64", "s3a://bucket-name/output/path/r_0_0_6c0518f6-7c1b-418f-a3e4-7db568880e6a"]> to contain exactly in any order: <[]> but the following elements were unexpected: <["s3a://bucket-name/output/path/r_0_0_c055250c-58c7-47ea-8b14-215cb5462e89", "s3a://bucket-name/output/path/r_1_1_9111aa65-96c2-465c-b278-696aff7707e3", "s3a://bucket-name/output/path/r_0_0_dec7f398-ee4e-4a53-a783-6b72cead569a", "s3a://bucket-name/output/path/r_1_1_39ad0eba-1053-4217-aa63-ddc8edfa7c64", "s3a://bucket-name/output/path/r_0_0_6c0518f6-7c1b-418f-a3e4-7db568880e6a"]> -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19183) RBF: Support leader follower mode for multiple subclusters
Yuanbo Liu created HADOOP-19183: --- Summary: RBF: Support leader follower mode for multiple subclusters Key: HADOOP-19183 URL: https://issues.apache.org/jira/browse/HADOOP-19183 Project: Hadoop Common Issue Type: Improvement Components: RBF Reporter: Yuanbo Liu Currently there are five modes in multiple subclusters like HASH, LOCAL, RANDOM, HASH_ALL,SPACE; Proposal a new mode called leader/follower mode. routers try to write to leader subcluster as many as possible. When routers read data, put leader subcluster into first rank. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19182) Upgrade kafka to 3.4.0
fuchaohong created HADOOP-19182: --- Summary: Upgrade kafka to 3.4.0 Key: HADOOP-19182 URL: https://issues.apache.org/jira/browse/HADOOP-19182 Project: Hadoop Common Issue Type: Bug Components: build Reporter: fuchaohong -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-19163) Upgrade protobuf version to 3.25.3
[ https://issues.apache.org/jira/browse/HADOOP-19163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved HADOOP-19163. - Resolution: Fixed done. not sure what version to tag with. Proposed: we cut a new release of this > Upgrade protobuf version to 3.25.3 > -- > > Key: HADOOP-19163 > URL: https://issues.apache.org/jira/browse/HADOOP-19163 > Project: Hadoop Common > Issue Type: Bug > Components: hadoop-thirdparty >Reporter: Bilwa S T >Assignee: Bilwa S T >Priority: Major > Labels: pull-request-available > Time Spent: 20m > Remaining Estimate: 0h > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Resolved] (HADOOP-13147) Constructors must not call overrideable methods in PureJavaCrc32C
[ https://issues.apache.org/jira/browse/HADOOP-13147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ayush Saxena resolved HADOOP-13147. --- Fix Version/s: 3.5.0 Hadoop Flags: Reviewed Resolution: Fixed > Constructors must not call overrideable methods in PureJavaCrc32C > - > > Key: HADOOP-13147 > URL: https://issues.apache.org/jira/browse/HADOOP-13147 > Project: Hadoop Common > Issue Type: Bug >Affects Versions: 2.0.6-alpha > Environment: > http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/PureJavaCrc32C.java >Reporter: Sebb >Assignee: Sebb >Priority: Blocker > Labels: pull-request-available > Fix For: 3.5.0 > > > Constructors must not call overrideable methods. > An object is not guaranteed fully constructed until the constructor exits, so > the subclass override may not see the fully created parent object. > This applies to: > PureJavaCrc32 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org
[jira] [Created] (HADOOP-19181) IAMCredentialsProvider throttle failures
Steve Loughran created HADOOP-19181: --- Summary: IAMCredentialsProvider throttle failures Key: HADOOP-19181 URL: https://issues.apache.org/jira/browse/HADOOP-19181 Project: Hadoop Common Issue Type: Sub-task Components: fs/s3 Affects Versions: 3.4.0 Reporter: Steve Loughran Tests report throttling errors in IAM being remapped to noauth and failure Again, impala tests, but with multiple processes on same host. this means that HADOOP-18945 isn't sufficient as even if it ensures a singleton instance for a process * it doesn't if there are many test buckets (fixable) * it doesn't work across processes (not fixable) we may be able to * use a singleton across all filesystem instances * once we know how throttling is reported, handle it through retries + error/stats collection {code} 2024-02-17T18:02:10,175 WARN [TThreadPoolServer WorkerProcess-22] fs.FileSystem: Failed to initialize fileystem s3a://impala-test-uswest2-1/test-warehouse/test_num_values_def_levels_mismatch_15b31ddb.db/too_many_def_levels: java.nio.file.AccessDeniedException: impala-test-uswest2-1: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : software.amazon.awssdk.core.exception.SdkClientException: Unable to load credentials from system settings. Access key must be specified either via environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId). 2024-02-17T18:02:10,175 ERROR [TThreadPoolServer WorkerProcess-22] utils.MetaStoreUtils: Got exception: java.nio.file.AccessDeniedException impala-test-uswest2-1: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : software.amazon.awssdk.core.exception.SdkClientException: Unable to load credentials from system settings. Access key must be specified either via environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId). java.nio.file.AccessDeniedException: impala-test-uswest2-1: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : software.amazon.awssdk.core.exception.SdkClientException: Unable to load credentials from system settings. Access key must be specified either via environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId). at org.apache.hadoop.fs.s3a.AWSCredentialProviderList.maybeTranslateCredentialException(AWSCredentialProviderList.java:351) ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:201) ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:124) ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:376) ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468) ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:372) ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:347) ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$verifyBucketExists$2(S3AFileSystem.java:972) ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:543) ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:524) ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:445) ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2748) ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:970) ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.s3a.S3AFileSystem.doBucketProbing(S3AFileSystem.java:859) ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:715) ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3452) ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?] at org.apache.hadoop.fs.FileSystem.access
[jira] [Resolved] (HADOOP-19167) Change of Codec configuration does not work
[ https://issues.apache.org/jira/browse/HADOOP-19167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ZanderXu resolved HADOOP-19167. --- Fix Version/s: 3.5.0 Resolution: Fixed > Change of Codec configuration does not work > --- > > Key: HADOOP-19167 > URL: https://issues.apache.org/jira/browse/HADOOP-19167 > Project: Hadoop Common > Issue Type: Bug > Components: compress >Reporter: Zhikai Hu >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.0 > > > In one of my projects, I need to dynamically adjust compression level for > different files. > However, I found that in most cases the new compression level does not take > effect as expected, the old compression level continues to be used. > Here is the relevant code snippet: > ZStandardCodec zStandardCodec = new ZStandardCodec(); > zStandardCodec.setConf(conf); > conf.set("io.compression.codec.zstd.level", "5"); // level may change > dynamically > conf.set("io.compression.codec.zstd", zStandardCodec.getClass().getName()); > writer = SequenceFile.createWriter(conf, > SequenceFile.Writer.file(sequenceFilePath), > > SequenceFile.Writer.keyClass(LongWritable.class), > > SequenceFile.Writer.valueClass(BytesWritable.class), > > SequenceFile.Writer.compression(CompressionType.BLOCK)); > The reason is SequenceFile.Writer.init() method will call > CodecPool.getCompressor(codec, null) to get a compressor. > If the compressor is a reused instance, the conf is not applied because it is > passed as null: > public static Compressor getCompressor(CompressionCodec codec, Configuration > conf) { > Compressor compressor = borrow(compressorPool, codec.getCompressorType()); > if (compressor == null) > { compressor = codec.createCompressor(); LOG.info("Got brand-new compressor > ["+codec.getDefaultExtension()+"]"); } > else { > compressor.reinit(conf); //conf is null here > .. > > Please also refer to my unit test to reproduce the bug. > To address this bug, I modified the code to ensure that the configuration is > read back from the codec when a compressor is reused. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org