[jira] [Created] (HADOOP-19245) S3ABlockOutputStream no longer sends progress events in close()

2024-08-01 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19245:
---

 Summary: S3ABlockOutputStream no longer sends progress events in 
close()
 Key: HADOOP-19245
 URL: https://issues.apache.org/jira/browse/HADOOP-19245
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran
Assignee: Steve Loughran


We don't get progress events passed through from S3ABlockOutputStream to any 
Progress instance passed in which doesn't implement ProgressListener

This is due to a signature mismatch between the changed ProgressableListener 
interface and the {{S3ABlockOutputStream.ProgressListener}} impl.

* critical because distcp jobs will timeout on large uploads without this
* trivial to fix; does need a test



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19244) Pullout arch-agnostic maven javadoc plugin configurations in hadoop-common

2024-08-01 Thread Cheng Pan (Jira)
Cheng Pan created HADOOP-19244:
--

 Summary: Pullout arch-agnostic maven javadoc plugin configurations 
in hadoop-common
 Key: HADOOP-19244
 URL: https://issues.apache.org/jira/browse/HADOOP-19244
 Project: Hadoop Common
  Issue Type: Improvement
  Components: build, common
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19243) Upgrade Mockito version to 4.11.0

2024-07-31 Thread Muskan Mishra (Jira)
Muskan Mishra created HADOOP-19243:
--

 Summary: Upgrade Mockito version to 4.11.0
 Key: HADOOP-19243
 URL: https://issues.apache.org/jira/browse/HADOOP-19243
 Project: Hadoop Common
  Issue Type: Task
Reporter: Muskan Mishra
Assignee: Muskan Mishra


While Compiling test classes with JDK17, faced error related to Mockito:
*Mockito cannot mock this class.*
So to make it compatible with jdk17 we have to upgrade the version of 
mockito-core as well as mockito-inline.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19242) Add a feature to disable redirection for the OSS connector.

2024-07-31 Thread zhouao (Jira)
zhouao created HADOOP-19242:
---

 Summary: Add a feature to disable redirection for the OSS 
connector.
 Key: HADOOP-19242
 URL: https://issues.apache.org/jira/browse/HADOOP-19242
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/oss
Affects Versions: 3.3.2, 3.1.0
Reporter: zhouao


For security reasons, some users of the OSS connector wish to disable the 
connector's HTTP redirection functionality. The OSS Java SDK have the 
capability to turn off HTTP redirection, but the configuration is not exposed 
in the {{core-site.xml}} file. This change primarily involves adding a flag to 
disable HTTP redirection in the {{core-site.xml}} file



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19241) NoSuchMethodError in aws sdk third party logger in hadoop aws 3.4

2024-07-29 Thread ashutoshraina (Jira)
ashutoshraina created HADOOP-19241:
--

 Summary: NoSuchMethodError in aws sdk third party logger in hadoop 
aws 3.4
 Key: HADOOP-19241
 URL: https://issues.apache.org/jira/browse/HADOOP-19241
 Project: Hadoop Common
  Issue Type: Bug
  Components: hadoop-thirdparty, tools
Affects Versions: 3.4.0
Reporter: ashutoshraina


{code:java}
"localizedMessage": "java.lang.NoSuchMethodError: 
'software.amazon.awssdk.thirdparty.org.slf4j.Logger 
software.amazon.awssdk.utils.Logger.logger()'",
"message": "java.lang.NoSuchMethodError: 
'software.amazon.awssdk.thirdparty.org.slf4j.Logger 
software.amazon.awssdk.utils.Logger.logger()'",
"name": "com.google.common.util.concurrent.ExecutionError",
"cause": {
  "commonElementCount": 1,
  "localizedMessage": "'software.amazon.awssdk.thirdparty.org.slf4j.Logger 
software.amazon.awssdk.utils.Logger.logger()'",
  "message": "'software.amazon.awssdk.thirdparty.org.slf4j.Logger 
software.amazon.awssdk.utils.Logger.logger()'",
  "name": "java.lang.NoSuchMethodError",
  "extendedStackTrace": [
{
  "class": 
"software.amazon.awssdk.transfer.s3.internal.GenericS3TransferManager",
  "method": "close",
  "file": "GenericS3TransferManager.java",
  "line": 393,
  "exact": false,
  "location": "bundle-2.23.19.jar",
  "version": "?"
},
{
  "class": "org.apache.hadoop.fs.s3a.S3AUtils",
  "method": "closeAutocloseables",
  "file": "S3AUtils.java",
  "line": 1553,
  "exact": false,
  "location": "hadoop-aws-3.4.0.jar",
  "version": "?"
},
{
  "class": "org.apache.hadoop.fs.s3a.S3AFileSystem",
  "method": "stopAllServices",
  "file": "S3AFileSystem.java",
  "line": 4358,
  "exact": false,
  "location": "hadoop-aws-3.4.0.jar",
  "version": "?"
},
{
  "class": "org.apache.hadoop.fs.s3a.S3AFileSystem",
  "method": "initialize",
  "file": "S3AFileSystem.java",
  "line": 758,
  "exact": false,
  "location": "hadoop-aws-3.4.0.jar",
  "version": "?"
},
{
  "class": "org.apache.hadoop.fs.FileSystem",
  "method": "createFileSystem",
  "file": "FileSystem.java",
  "line": 3601,
  "exact": false,
  "location": "hadoop-common-3.4.0.jar",
  "version": "?"
},
{
  "class": "org.apache.hadoop.fs.FileSystem",
  "method": "get",
  "file": "FileSystem.java",
  "line": 552,
  "exact": false,
  "location": "hadoop-common-3.4.0.jar",
  "version": "?"
}, {code}
 

This appears to be related to how shading works in the aws bundle sdk.

Versions : Hadoop-AWS 3.4

AWS-SDK-Bundle - 2.23.19

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19161) S3A: option "fs.s3a.performance.flags" to take list of performance flags

2024-07-29 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19161.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> S3A: option "fs.s3a.performance.flags" to take list of performance flags
> 
>
> Key: HADOOP-19161
> URL: https://issues.apache.org/jira/browse/HADOOP-19161
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> HADOOP-19072 shows we want to add more optimisations than that of 
> HADOOP-18930.
> * Extending the new optimisations to the existing option is brittle
> * Adding explicit options for each feature gets complext fast.
> Proposed
> * A new class S3APerformanceFlags keeps all the flags
> * it build this from a string[] of values, which can be extracted from 
> getConf(),
> * and it can also support a "*" option to mean "everything"
> * this class can also be handed off to hasPathCapability() and do the right 
> thing.
> Proposed optimisations
> * create file (we will hook up HADOOP-18930)
> * mkdir (HADOOP-19072)
> * delete (probe for parent path)
> * rename (probe for source path)
> We could think of more, with different names, later.
> The goal is make it possible to strip out every HTTP request we do for 
> safety/posix compliance, so applications have the option of turning off what 
> they don't need.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19239) Enhance FileSystem to honor token and expiration in its cache

2024-07-27 Thread Xiang Li (Jira)
Xiang Li created HADOOP-19239:
-

 Summary: Enhance FileSystem to honor token and expiration in its 
cache
 Key: HADOOP-19239
 URL: https://issues.apache.org/jira/browse/HADOOP-19239
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Affects Versions: 3.3.6
Reporter: Xiang Li
 Fix For: 3.3.4






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19238) Fix create-release script for arm64 based MacOS

2024-07-25 Thread Mukund Thakur (Jira)
Mukund Thakur created HADOOP-19238:
--

 Summary: Fix create-release script for arm64 based MacOS
 Key: HADOOP-19238
 URL: https://issues.apache.org/jira/browse/HADOOP-19238
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Mukund Thakur






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19237) upgrade dnsjava to 3.6.0 due to CVEs

2024-07-25 Thread PJ Fanning (Jira)
PJ Fanning created HADOOP-19237:
---

 Summary: upgrade dnsjava to 3.6.0 due to CVEs
 Key: HADOOP-19237
 URL: https://issues.apache.org/jira/browse/HADOOP-19237
 Project: Hadoop Common
  Issue Type: Task
Reporter: PJ Fanning


See https://github.com/apache/hadoop/pull/6955 - but this is missing the 
necessary change to LICENSE-binary (which already has an out of date version 
for dnsjava).

* CVE-2023-32695
* CVE-2024-25638
* https://github.com/advisories/GHSA-crjg-w57m-rqqf





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19236) Integration of Volcano Engine TOS in Hadoop.

2024-07-24 Thread Jinglun (Jira)
Jinglun created HADOOP-19236:


 Summary: Integration of Volcano Engine TOS in Hadoop.
 Key: HADOOP-19236
 URL: https://issues.apache.org/jira/browse/HADOOP-19236
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs, tools
Reporter: Jinglun


Volcano Engine is a fast growing cloud vendor launched by ByteDance, and TOS is 
the object storage service of Volcano Engine. A common way is to store data 
into TOS and run Hadoop/Spark/Flink applications to access TOS. But there is no 
original support for TOS in hadoop, thus it is not easy for users to build 
their Big Data System based on TOS.
 
This work aims to integrate TOS with Hadoop to help users run their 
applications on TOS. Users only need to do some simple configuration, then 
their applications can read/write TOS without any code change. This work is 
similar to AWS S3, AzureBlob, AliyunOSS, Tencnet COS and HuaweiCloud Object 
Storage in Hadoop.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19235) IPC client uses CompletableFuture to support asynchronous operations.

2024-07-23 Thread Jian Zhang (Jira)
Jian Zhang created HADOOP-19235:
---

 Summary: IPC client uses CompletableFuture to support asynchronous 
operations.
 Key: HADOOP-19235
 URL: https://issues.apache.org/jira/browse/HADOOP-19235
 Project: Hadoop Common
  Issue Type: New Feature
  Components: common
Reporter: Jian Zhang


h3. Description

In the implementation of asynchronous Ipc.client, the main methods used include 
HADOOP-13226, HDFS-10224, etc.

However, the existing implementation does not support `CompletableFuture`; 
instead, it relies on setting up callbacks, which can lead to the "callback 
hell" problem. Using `CompletableFuture` can better organize asynchronous 
callbacks. Therefore, on the basis of the existing implementation, by using 
`CompletableFuture`, once the `client.call` is completed, the asynchronous 
thread handles the response of this call without blocking the main thread.

 

*Test*

new UT  TestAsyncIPC#testAsyncCallWithCompletableFuture()



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19234) ABFS: [FnsOverBlob] Adding Integration Tests for Special Scenarios in Blob Endpoint

2024-07-23 Thread Anuj Modi (Jira)
Anuj Modi created HADOOP-19234:
--

 Summary: ABFS: [FnsOverBlob] Adding Integration Tests for Special 
Scenarios in Blob Endpoint
 Key: HADOOP-19234
 URL: https://issues.apache.org/jira/browse/HADOOP-19234
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.4.0
Reporter: Anuj Modi


FNS accounts does not understand directories and to create that abstraction 
client has to handle the cases where hdfs operations include interactions with 
directory paths. This needs some additional testing to be done for each HDFS 
operation where path can exists as directory.

More details to follow

Perquisites: 
 # HADOOP-19187 ABFS: [FnsOverBlob]Making AbfsClient Abstract for supporting 
both DFS and Blob Endpoint
 # HADOOP-19207 ABFS: [FnsOverBlob]Response Handling of Blob Endpoint APIs and 
Metadata 
APIs[|https://issues.apache.org/jira/secure/DeleteLink.jspa?id=13579416=13583033=12310460_token=A5KQ-2QAV-T4JA-FDED_17fc7154167b7d6d6490aa6508db554fd6d7af24_lin]
 # HADOOP-19226 ABFS: [FnsOverBlob]Implementing Azure Rest APIs on Blob 
Endpoint for AbfsBlobClient
 # HADOOP-19232 ABFS: [FnsOverBlob] Implementing Ingress Support with various 
Fallback Handling
 # HADOOP-19233 ABFS: [FnsOverBlob] Implementing Rename and Delete APIs over 
Blob Endpoint
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19233) ABFS: [FnsOverBlob] Implementing Rename and Delete APIs over Blob Endpoint

2024-07-23 Thread Anuj Modi (Jira)
Anuj Modi created HADOOP-19233:
--

 Summary: ABFS: [FnsOverBlob] Implementing Rename and Delete APIs 
over Blob Endpoint
 Key: HADOOP-19233
 URL: https://issues.apache.org/jira/browse/HADOOP-19233
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.4.0
Reporter: Anuj Modi
Assignee: Anuj Modi


Enable rename and delete over Blob endpoint. The endpoint does not support 
rename API and not directory-delete. Therefore, all the orchestration and 
handling has to be added on client side.

More details will follow



Perquisites for this Patch:
1. HADOOP-19187 ABFS: [FnsOverBlob]Making AbfsClient Abstract for supporting 
both DFS and Blob Endpoint - ASF JIRA (apache.org)

2. HADOOP-19226 ABFS: [FnsOverBlob]Implementing Azure Rest APIs on Blob 
Endpoint for AbfsBlobClient - ASF JIRA (apache.org)

3. HADOOP-19207 ABFS: [FnsOverBlob]Response Handling of Blob Endpoint APIs and 
Metadata APIs - ASF JIRA (apache.org)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19232) ABFS: [FnsOverBlob] Implementing Ingress Support with various Fallback Handling

2024-07-23 Thread Anuj Modi (Jira)
Anuj Modi created HADOOP-19232:
--

 Summary: ABFS: [FnsOverBlob] Implementing Ingress Support with 
various Fallback Handling
 Key: HADOOP-19232
 URL: https://issues.apache.org/jira/browse/HADOOP-19232
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.4.0
Reporter: Anuj Modi
Assignee: Anmol Asrani


Scope of this task is to refactor the AbfsOutputStream class to handle the 
ingress for DFS and Blob endpoint effectively.

More details will be added soon.

Perquisites for this Patch:
1. [HADOOP-19187] ABFS: [FnsOverBlob]Making AbfsClient Abstract for supporting 
both DFS and Blob Endpoint - ASF JIRA (apache.org)

2. [HADOOP-19226] ABFS: [FnsOverBlob]Implementing Azure Rest APIs on Blob 
Endpoint for AbfsBlobClient - ASF JIRA (apache.org)

3. [HADOOP-19207] ABFS: [FnsOverBlob]Response Handling of Blob Endpoint APIs 
and Metadata APIs - ASF JIRA (apache.org)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19231) add JacksonUtil to centralise some code

2024-07-21 Thread PJ Fanning (Jira)
PJ Fanning created HADOOP-19231:
---

 Summary: add JacksonUtil to centralise some code
 Key: HADOOP-19231
 URL: https://issues.apache.org/jira/browse/HADOOP-19231
 Project: Hadoop Common
  Issue Type: Task
Reporter: PJ Fanning


To future proof Hadoop against Jackson changes, it makes sense to not just 
create ObjectMappers and JsonFactories in many different places in the Hadoop 
code.

One of the main drivers of this is 
https://www.javadoc.io/doc/com.fasterxml.jackson.core/jackson-core/latest/com/fasterxml/jackson/core/StreamReadConstraints.html
 

Jackson 3 (not yet scheduled for release) has some fairly big API and behaviour 
changes too.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19228) ShellCommandFencer#setConfAsEnvVars should also replace '-' with '_'.

2024-07-20 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He resolved HADOOP-19228.
--
Fix Version/s: 3.5.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> ShellCommandFencer#setConfAsEnvVars should also replace '-' with '_'.
> -
>
> Key: HADOOP-19228
> URL: https://issues.apache.org/jira/browse/HADOOP-19228
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: fuchaohong
>Assignee: fuchaohong
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> When setting configuration into environment variables, '-' should also be 
> replaced with '_'.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19227) ipc.Server accelerate token negotiation only for the default mechanism

2024-07-20 Thread Tsz-wo Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze resolved HADOOP-19227.
-
Fix Version/s: 3.3.7
 Hadoop Flags: Reviewed
   Resolution: Fixed

The pull request is now merged.

> ipc.Server accelerate token negotiation only for the default mechanism
> --
>
> Key: HADOOP-19227
> URL: https://issues.apache.org/jira/browse/HADOOP-19227
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Reporter: Tsz-wo Sze
>Assignee: Tsz-wo Sze
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.7
>
>
> {code}
> //Server.java
>   // accelerate token negotiation by sending initial challenge
>   // in the negotiation response
>   if (enabledAuthMethods.contains(AuthMethod.TOKEN)) {
> ...
>   }
> {code}
> In Server.Connection.buildSaslNegotiateResponse() above, it accelerates token 
> negotiation by sending initial challenge in the negotiation response.  
> However, it is a non-standard SASL
> negotiation.  We should do it only for the default SASL mechanism.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19230) upgrade to jackson 2.14.3

2024-07-19 Thread PJ Fanning (Jira)
PJ Fanning created HADOOP-19230:
---

 Summary: upgrade to jackson 2.14.3
 Key: HADOOP-19230
 URL: https://issues.apache.org/jira/browse/HADOOP-19230
 Project: Hadoop Common
  Issue Type: Task
  Components: common
Reporter: PJ Fanning


Follow up to HADOOP-18332

I have what I believe fixes the Jackson JAX-RS incompatibility.

https://github.com/pjfanning/jsr311-compat/

The reason that I want to start by just going to Jackson 2.14 is that Jackson 
has new StreamReadConstraints in Jackson 2.15 to protect against malicious JSON 
inputs. The constraints are generous but can cause issues with very large or 
deeply nested inputs.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19229) Vector IO: have a max distance between ranges to range

2024-07-17 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19229:
---

 Summary: Vector IO: have a max distance between ranges to range
 Key: HADOOP-19229
 URL: https://issues.apache.org/jira/browse/HADOOP-19229
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.1
Reporter: Steve Loughran


vector iO has a max size to coalesce ranges, but it also needs a maximum gap 
between ranges to justify the merge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19218) Avoid DNS lookup while creating IPC Connection object

2024-07-16 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He resolved HADOOP-19218.
--
   Fix Version/s: 3.5.0
Hadoop Flags: Reviewed
Target Version/s:   (was: 3.3.9, 3.5.0, 3.4.1)
  Resolution: Fixed

> Avoid DNS lookup while creating IPC Connection object
> -
>
> Key: HADOOP-19218
> URL: https://issues.apache.org/jira/browse/HADOOP-19218
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: ipc
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Been running HADOOP-18628 in production for quite sometime, everything works 
> fine as long as DNS servers in HA are available. Upgrading single NS server 
> at a time is also a common case, not problematic. Every DNS lookup takes 1ms 
> in general.
> However, recently we encountered a case where 2 out of 4 NS servers went down 
> (temporarily but it's a rare case). With small duration DNS cache and 2s of 
> NS fallback timeout configured in resolv.conf, now any client performing DNS 
> lookup can encounter 4s+ delay. This caused namenode outage as listener 
> thread is single threaded and it was not able to keep up with large num of 
> unique clients (in direct proportion with num of DNS resolutions every few 
> seconds) initiating connection on listener port.
> While having 2 out of 4 DNS servers offline is rare case and NS fallback 
> settings could also be improved, it is important to note that we don't need 
> to perform DNS resolution for every new connection if the intention is to 
> improve the insights into VersionMistmatch errors thrown by the server.
> The proposal is the delay the DNS resolution until the server throws the 
> error for incompatible header or version mismatch. This would also help with 
> ~1ms extra time spent even for healthy DNS lookup.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19227) ipc.Server accelerate token negotiation only for the default mechanism

2024-07-16 Thread Tsz-wo Sze (Jira)
Tsz-wo Sze created HADOOP-19227:
---

 Summary: ipc.Server accelerate token negotiation only for the 
default mechanism
 Key: HADOOP-19227
 URL: https://issues.apache.org/jira/browse/HADOOP-19227
 Project: Hadoop Common
  Issue Type: Improvement
  Components: ipc
Reporter: Tsz-wo Sze
Assignee: Tsz-wo Sze


{code}
//Server.java
  // accelerate token negotiation by sending initial challenge
  // in the negotiation response
  if (enabledAuthMethods.contains(AuthMethod.TOKEN)) {
...
  }
{code}
In Server.Connection.buildSaslNegotiateResponse() above, it accelerates token 
negotiation by sending initial challenge in the negotiation response.  However, 
it is a non-standard SASL
negotiation.  We should do it only for the default SASL mechanism.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19226) ABFS: Implementing Azure Rest APIs on Blob Endpoint for AbfsBlobClient

2024-07-14 Thread Anuj Modi (Jira)
Anuj Modi created HADOOP-19226:
--

 Summary: ABFS: Implementing Azure Rest APIs on Blob Endpoint for 
AbfsBlobClient
 Key: HADOOP-19226
 URL: https://issues.apache.org/jira/browse/HADOOP-19226
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.4.0
Reporter: Anuj Modi


This is second task in series of tasks for implementing Blob Endpoint support 
for FNS accounts.

This patch will have changes to implement all the APIs over Blob Endpoint as a 
part of implementing AbfsBlobClient.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19225) Upgrade to jetty 9.4.55 due to CVE

2024-07-12 Thread Palakur Eshwitha Sai (Jira)
Palakur Eshwitha Sai created HADOOP-19225:
-

 Summary: Upgrade to jetty 9.4.55 due to CVE
 Key: HADOOP-19225
 URL: https://issues.apache.org/jira/browse/HADOOP-19225
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Palakur Eshwitha Sai
Assignee: Palakur Eshwitha Sai






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19222) Switch yum repo baseurl due to CentOS 7 sunset

2024-07-11 Thread Shilun Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shilun Fan resolved HADOOP-19222.
-
Hadoop Flags: Reviewed
  Resolution: Fixed

> Switch yum repo baseurl due to CentOS 7 sunset
> --
>
> Key: HADOOP-19222
> URL: https://issues.apache.org/jira/browse/HADOOP-19222
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.4.0, 3.4.1
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> Similar to HADOOP-18151 (which handled sunset for CentOS 8), CentOS 7 reached 
> EOL on July 1, 2024



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19224) Upgrade esdk to the latest version 3.24.3

2024-07-11 Thread melin (Jira)
melin created HADOOP-19224:
--

 Summary: Upgrade esdk to the latest version 3.24.3
 Key: HADOOP-19224
 URL: https://issues.apache.org/jira/browse/HADOOP-19224
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/huawei
Reporter: melin


The current version relies on okhttp 3.x and would like to upgrade to the 
latest version, which relies on okhttp 4.12



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19223) Don't fail CI if no tests are changed

2024-07-10 Thread Cheng Pan (Jira)
Cheng Pan created HADOOP-19223:
--

 Summary: Don't fail CI if no tests are changed
 Key: HADOOP-19223
 URL: https://issues.apache.org/jira/browse/HADOOP-19223
 Project: Hadoop Common
  Issue Type: Wish
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19222) Switch yum repo baseurl due to CentOS 7 sunset

2024-07-09 Thread Cheng Pan (Jira)
Cheng Pan created HADOOP-19222:
--

 Summary: Switch yum repo baseurl due to CentOS 7 sunset
 Key: HADOOP-19222
 URL: https://issues.apache.org/jira/browse/HADOOP-19222
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Cheng Pan


Similar to HADOOP-18151 (which handled sunset for CentOS 8), CentOS reached EOL 
on July 1, 2024



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-13463) update to Guice 4.1

2024-07-09 Thread Cheng Pan (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13463?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Pan resolved HADOOP-13463.

Resolution: Won't Do

Replaced with HADOOP-19216

> update to Guice 4.1
> ---
>
> Key: HADOOP-13463
> URL: https://issues.apache.org/jira/browse/HADOOP-13463
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Affects Versions: 3.0.0-alpha1
>Reporter: Sean Busbey
>Priority: Minor
>
> Right now trunk uses Guice 4.0, which is about a year old. We should update 
> to 4.1, so long as we're making the jump from 3 to 4 in the branch-2 -> 3.0 
> transition.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19220) S3A : S3AInputStream positioned readFully Expectation

2024-07-08 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19220.
-
Resolution: Works for Me

it works for me and for the people who support calls would ruin my life if it 
didn't work for them. 

You have probably done something with your mocking test set up that does not 
match what s3afs does. My recommendation is: step through the failing test with 
a debugger. 

I'm not going to look at the code because the way to do anything like that 
would be to share it as a github reference. But anyway, not a jira class issue 
- Not yet, anyway. This is the kind of problem to raise on the developer 
mailing list.

For that reason, I'm going to close it as a WORKSFORME. sorry. stick the code 
on github as a gist or something and discusson the hadoop developer list. if it 
really is a bug in s3a fs code, this jira can be re-opened.

> S3A : S3AInputStream positioned readFully Expectation
> -
>
> Key: HADOOP-19220
> URL: https://issues.apache.org/jira/browse/HADOOP-19220
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Reporter: Vinay Devadiga
>Priority: Major
>
> So basically i was testing to write some unit test - for S3AInputStream 
> readFully Method 
> package org.apache.hadoop.fs.s3a;
> import java.io.EOFException;
> import java.io.FilterInputStream;
> import java.io.IOException;
> import java.io.InputStream;
> import java.net.SocketException;
> import java.net.URI;
> import java.nio.ByteBuffer;
> import java.nio.charset.Charset;
> import java.nio.charset.StandardCharsets;
> import java.util.concurrent.CompletableFuture;
> import java.util.concurrent.TimeUnit;
> import org.apache.commons.io.IOUtils;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.fs.s3a.audit.impl.NoopSpan;
> import org.apache.hadoop.fs.s3a.auth.delegation.EncryptionSecrets;
> import org.apache.hadoop.util.BlockingThreadPoolExecutorService;
> import org.apache.hadoop.util.functional.CallableRaisingIOE;
> import org.assertj.core.api.Assertions;
> import org.junit.Before;
> import org.junit.Test;
> import software.amazon.awssdk.awscore.exception.AwsErrorDetails;
> import software.amazon.awssdk.awscore.exception.AwsServiceException;
> import software.amazon.awssdk.core.ResponseInputStream;
> import software.amazon.awssdk.http.AbortableInputStream;
> import software.amazon.awssdk.services.s3.S3Client;
> import software.amazon.awssdk.services.s3.model.GetObjectRequest;
> import software.amazon.awssdk.services.s3.model.GetObjectResponse;
> import static java.lang.Math.min;
> import static java.nio.charset.StandardCharsets.UTF_8;
> import static org.apache.hadoop.fs.s3a.Constants.ASYNC_DRAIN_THRESHOLD;
> import static org.apache.hadoop.fs.s3a.Constants.AWS_REGION;
> import static org.apache.hadoop.fs.s3a.Constants.FS_S3A;
> import static org.apache.hadoop.fs.s3a.Constants.MULTIPART_MIN_SIZE;
> import static org.apache.hadoop.fs.s3a.Constants.S3_CLIENT_FACTORY_IMPL;
> import static org.apache.hadoop.util.functional.FutureIO.eval;
> import static org.assertj.core.api.Assertions.assertThat;
> import static 
> org.assertj.core.api.AssertionsForClassTypes.assertThatExceptionOfType;
> import static org.mockito.ArgumentMatchers.any;
> import static org.mockito.Mockito.never;
> import static org.mockito.Mockito.verify;
> public class TestReadFullyAndPositionalRead {
> private S3AFileSystem fs;
> private S3AInputStream input;
> private S3Client s3;
> private static final String EMPTY = "";
> private static final String INPUT = "test_content";
> @Before
> public void setUp() throws IOException {
> Configuration conf = createConfiguration();
> fs = new S3AFileSystem();
> URI uri = URI.create(FS_S3A + "://" + MockS3AFileSystem.BUCKET);
> // Unset S3CSE property from config to avoid pathIOE.
> conf.unset(Constants.S3_ENCRYPTION_ALGORITHM);
> fs.initialize(uri, conf);
> s3 = fs.getS3AInternals().getAmazonS3Client("mocking");
> }
> public Configuration createConfiguration() {
> Configuration conf = new Configuration();
> conf.setClass(S3_CLIENT_FACTORY_IMPL, MockS3ClientFactory.class, 
> S3ClientFactory.class);
> // use minimum multipart size for faster triggering
> conf.setLong(Constants.MULTIPART_SIZE, MULTIPART_MIN_SIZE);
> conf.setInt(Constants.S3A_BUCKET_PROBE, 1);
> // this is so stream draining

[jira] [Created] (HADOOP-19221) S3a: retry on 400 +ErrorCode RequestTimeout

2024-07-08 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19221:
---

 Summary: S3a: retry on 400 +ErrorCode RequestTimeout
 Key: HADOOP-19221
 URL: https://issues.apache.org/jira/browse/HADOOP-19221
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


if a slow block update takes too long then the connection is broken s3 side 
with an error message, as a 400 response

{code}
Your socket connection to the server was not read from or written to within the 
timeout period. Idle connections will be closed. (Service: Amazon S3; Status 
Code: 400; Error Code: RequestTimeout; Request ID:; S3 Extended Request ID:
{code}

This is recoverable and should be treated as such, either using the normal 
exception policy or maybe even throttlePolicy





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19195) Upgrade aws sdk v2 to 2.25.53

2024-07-08 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19195.
-
Fix Version/s: 3.4.1
   Resolution: Fixed

merged to 3.4 and trunk branches

Harshit, can you leave the "fix version" field blank; use target version to 
indicate which version it is aimed at. We use the fix version to track which 
versions it has actually been marged into, and for the automated release note 
generation. thanks

> Upgrade aws sdk v2 to 2.25.53
> -
>
> Key: HADOOP-19195
> URL: https://issues.apache.org/jira/browse/HADOOP-19195
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.5.0, 3.4.1
>Reporter: Harshit Gupta
>Assignee: Harshit Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> Upgrade aws sdk v2 to 2.25.53



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19220) S3A : S3AInputStream positioned readFully Expectation

2024-07-07 Thread Vinay Devadiga (Jira)
Vinay Devadiga created HADOOP-19220:
---

 Summary: S3A : S3AInputStream positioned readFully Expectation
 Key: HADOOP-19220
 URL: https://issues.apache.org/jira/browse/HADOOP-19220
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Reporter: Vinay Devadiga


So basically i was testing to write some unit test - for S3AInputStream 
readFully Method 

package org.apache.hadoop.fs.s3a;

import java.io.EOFException;
import java.io.FilterInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.net.SocketException;
import java.net.URI;
import java.nio.ByteBuffer;
import java.nio.charset.Charset;
import java.nio.charset.StandardCharsets;
import java.util.concurrent.CompletableFuture;
import java.util.concurrent.TimeUnit;

import org.apache.commons.io.IOUtils;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.fs.s3a.audit.impl.NoopSpan;
import org.apache.hadoop.fs.s3a.auth.delegation.EncryptionSecrets;
import org.apache.hadoop.util.BlockingThreadPoolExecutorService;
import org.apache.hadoop.util.functional.CallableRaisingIOE;
import org.assertj.core.api.Assertions;
import org.junit.Before;
import org.junit.Test;
import software.amazon.awssdk.awscore.exception.AwsErrorDetails;
import software.amazon.awssdk.awscore.exception.AwsServiceException;
import software.amazon.awssdk.core.ResponseInputStream;
import software.amazon.awssdk.http.AbortableInputStream;
import software.amazon.awssdk.services.s3.S3Client;
import software.amazon.awssdk.services.s3.model.GetObjectRequest;
import software.amazon.awssdk.services.s3.model.GetObjectResponse;

import static java.lang.Math.min;
import static java.nio.charset.StandardCharsets.UTF_8;
import static org.apache.hadoop.fs.s3a.Constants.ASYNC_DRAIN_THRESHOLD;
import static org.apache.hadoop.fs.s3a.Constants.AWS_REGION;
import static org.apache.hadoop.fs.s3a.Constants.FS_S3A;
import static org.apache.hadoop.fs.s3a.Constants.MULTIPART_MIN_SIZE;
import static org.apache.hadoop.fs.s3a.Constants.S3_CLIENT_FACTORY_IMPL;
import static org.apache.hadoop.util.functional.FutureIO.eval;
import static org.assertj.core.api.Assertions.assertThat;
import static 
org.assertj.core.api.AssertionsForClassTypes.assertThatExceptionOfType;
import static org.mockito.ArgumentMatchers.any;
import static org.mockito.Mockito.never;
import static org.mockito.Mockito.verify;

public class TestReadFullyAndPositionalRead {

private S3AFileSystem fs;
private S3AInputStream input;
private S3Client s3;
private static final String EMPTY = "";
private static final String INPUT = "test_content";

@Before
public void setUp() throws IOException {
Configuration conf = createConfiguration();
fs = new S3AFileSystem();
URI uri = URI.create(FS_S3A + "://" + MockS3AFileSystem.BUCKET);
// Unset S3CSE property from config to avoid pathIOE.
conf.unset(Constants.S3_ENCRYPTION_ALGORITHM);
fs.initialize(uri, conf);
s3 = fs.getS3AInternals().getAmazonS3Client("mocking");
}

public Configuration createConfiguration() {
Configuration conf = new Configuration();
conf.setClass(S3_CLIENT_FACTORY_IMPL, MockS3ClientFactory.class, 
S3ClientFactory.class);
// use minimum multipart size for faster triggering
conf.setLong(Constants.MULTIPART_SIZE, MULTIPART_MIN_SIZE);
conf.setInt(Constants.S3A_BUCKET_PROBE, 1);
// this is so stream draining is always blocking, allowing assertions 
to be safely made without worrying about any race conditions
conf.setInt(ASYNC_DRAIN_THRESHOLD, Integer.MAX_VALUE);
// set the region to avoid the getBucketLocation on FS init.
conf.set(AWS_REGION, "eu-west-1");
return conf;
}

@Test
public void testReadFullyFromBeginning() throws IOException {
input = getMockedS3AInputStream(INPUT);
byte[] byteArray = new byte[INPUT.length()];
input.readFully(0, byteArray, 0, byteArray.length);
assertThat(new String(byteArray, UTF_8)).isEqualTo(INPUT);
}

@Test
public void testReadFullyWithOffsetAndLength() throws IOException {
input = getMockedS3AInputStream(INPUT);
byte[] byteArray = new byte[4];
input.readFully(5, byteArray, 0, 4);
assertThat(new String(byteArray, UTF_8)).isEqualTo("cont");
}

@Test
public void testReadFullyWithOffsetBeyondStream() throws IOException {
input = getMockedS3AInputStream(INPUT);
byte[] byteArray = new byte[10];
assertThatExceptionOfType(EOFException.class)
.isThrownBy(() -> input.readFully(20, byteArray, 0, 10));
}

private S3AInputStream getMockedS3AInputStream(String input) {
Path path = new Path(

[jira] [Resolved] (HADOOP-19216) Upgrade Guice from 4.0 to 5.1.0 to support Java 17

2024-07-06 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HADOOP-19216.
---
Fix Version/s: 3.5.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Upgrade Guice from 4.0 to 5.1.0 to support Java 17
> --
>
> Key: HADOOP-19216
> URL: https://issues.apache.org/jira/browse/HADOOP-19216
> Project: Hadoop Common
>  Issue Type: Task
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19205) S3A initialization/close slower than with v1 SDK

2024-07-05 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19205.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> S3A initialization/close slower than with v1 SDK
> 
>
> Key: HADOOP-19205
> URL: https://issues.apache.org/jira/browse/HADOOP-19205
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
> Attachments: Screenshot 2024-06-14 at 17.12.59.png, Screenshot 
> 2024-06-14 at 17.14.33.png
>
>
> Hive QE have observed slowdown in LLAP queries due to time to create and 
> close s3a filesystems instances. A key aspect of that is they keep closing 
> the fs instances (HIVE-27884), but looking at the profiles, the reason things 
> seem to have regressed is
> * two s3 clients are being created (sync and async)
> * these seem to take a lot of time scanning the classpath for "global 
> interceptors", which is at least an O(jars) operation; #of index entries in 
> the zip files may factor too.
> Proposed:
> * create async client on demand when the transfer manager is invoked
> * look at why passwords are being scanned for if 
> InstanceProfileCredentialsProvider is in use...that seems slow too
> SDK wishes
> * SDK maybe allow us to turn off that scan for interceptors?
> attaching screenshots of the profile. storediag snippet:
> {code}
> [001]  fs.s3a.access.key = (unset)
> [002]  fs.s3a.secret.key = (unset)
> [003]  fs.s3a.session.token = (unset)
> [004]  fs.s3a.server-side-encryption-algorithm = (unset)
> [005]  fs.s3a.server-side-encryption.key = (unset)
> [006]  fs.s3a.encryption.algorithm = (unset)
> [007]  fs.s3a.encryption.key = (unset)
> [008]  fs.s3a.aws.credentials.provider = 
> "com.amazonaws.auth.InstanceProfileCredentialsProvider" [core-site.xml]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19215) Fix unit tests testSlowConnection and testBadSetup failed in TestRPC

2024-07-05 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HADOOP-19215.
---
Fix Version/s: 3.5.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Fix unit tests testSlowConnection and testBadSetup failed in TestRPC
> 
>
> Key: HADOOP-19215
> URL: https://issues.apache.org/jira/browse/HADOOP-19215
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: test
>Affects Versions: 3.4.0
>Reporter: farmmamba
>Assignee: farmmamba
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Fix unit tests testSlowConnection and testBadSetup failed in TestRPC.
> We should use ProtobufRpcEngine2 ProtocolEngine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19219) Resolve Certificate error in Hadoop-auth tests.

2024-07-04 Thread Muskan Mishra (Jira)
Muskan Mishra created HADOOP-19219:
--

 Summary: Resolve Certificate error in Hadoop-auth tests.
 Key: HADOOP-19219
 URL: https://issues.apache.org/jira/browse/HADOOP-19219
 Project: Hadoop Common
  Issue Type: Sub-task
Reporter: Muskan Mishra


While compiling Hadoop-Trunk with JDK17, faced following errors in 
TestMultiSchemeAuthenticationHandler and 
TestLdapAuthenticationHandler classes.


{code:java}
[INFO] Running 
org.apache.hadoop.security.authentication.server.TestMultiSchemeAuthenticationHandler
[ERROR] Tests run: 1, Failures: 0, Errors: 1, Skipped: 0, Time elapsed: 1.256 s 
<<< FAILURE! - in 
org.apache.hadoop.security.authentication.server.TestMultiSchemeAuthenticationHandler
[ERROR] 
org.apache.hadoop.security.authentication.server.TestMultiSchemeAuthenticationHandler
  Time elapsed: 1.255 s  <<< ERROR!
java.lang.IllegalAccessError: class 
org.apache.directory.server.core.security.CertificateUtil (in unnamed module 
@0x32e614e9) cannot access class sun.security.x509.X500Name (in module 
java.base) because module java.base does not export sun.security.x509 to 
unnamed module @0x32e614e9
at 
org.apache.directory.server.core.security.CertificateUtil.createTempKeyStore(CertificateUtil.java:334)
at 
org.apache.directory.server.factory.ServerAnnotationProcessor.instantiateLdapServer(ServerAnnotationProcessor.java:158)
at 
org.apache.directory.server.factory.ServerAnnotationProcessor.createLdapServer(ServerAnnotationProcessor.java:318)
at 
org.apache.directory.server.factory.ServerAnnotationProcessor.createLdapServer(ServerAnnotationProcessor.java:351)
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19218) Avoid DNS lookup while creating IPC Connection object

2024-07-02 Thread Viraj Jasani (Jira)
Viraj Jasani created HADOOP-19218:
-

 Summary: Avoid DNS lookup while creating IPC Connection object
 Key: HADOOP-19218
 URL: https://issues.apache.org/jira/browse/HADOOP-19218
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Viraj Jasani


Been running HADOOP-18628 in production for quite sometime, everything works 
fine as long as DNS servers in HA are available. Upgrading single NS server at 
a time is also a common case, not problematic.

However, recently we encountered a case where 2 out of 4 NS servers went down 
(temporarily but it's a rare case). With small duration DNS cache and 2s of NS 
fallback timeout configured in resolv.conf, now any client performing DNS 
lookup can encounter 4s+ delay. This caused namenode outage as listener thread 
is single threaded and it was not able to keep up with large num of unique 
clients (in direct proportion with num of DNS resolutions every few seconds) 
initiating connection on listener port.

While having 2 out of 4 DNS servers offline is rare case and NS fallback 
settings could also be improved, it is important to note that we don't need to 
perform DNS resolution for every new connection if the intention is to improve 
the insights into VersionMistmatch errors thrown by the server.

The proposal is the delay the DNS resolution until the server throws the error 
for incompatible header or version mismatch.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19210) s3a: TestS3AAWSCredentialsProvider and TestS3AInputStreamRetry really slow

2024-07-02 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19210.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> s3a: TestS3AAWSCredentialsProvider and TestS3AInputStreamRetry really slow
> --
>
> Key: HADOOP-19210
> URL: https://issues.apache.org/jira/browse/HADOOP-19210
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Fix For: 3.5.0, 3.4.1
>
>
> Not noticed this before, but the unit tests TestS3AAWSCredentialsProvider and 
> TestS3AInputStreamRetry are so slow they will be hurting over all test 
> performance times: no integration tests will start until these are all 
> complete.
> {code}
> mvn test -T 1C -Dparallel-tests
> ...
> [INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 37.877 
> s - in org.apache.hadoop.fs.s3a.TestS3AInputStreamRetry
> ...
> [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
> 90.038 s - in org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider
> {code}
> The PR cuts total execution time of a 10 thread test run from 3 minutes to 
> 2:30



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19217) Introduce getTrashPolicy to FileSystem API

2024-07-02 Thread Ivan Andika (Jira)
Ivan Andika created HADOOP-19217:


 Summary: Introduce getTrashPolicy to FileSystem API
 Key: HADOOP-19217
 URL: https://issues.apache.org/jira/browse/HADOOP-19217
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Reporter: Ivan Andika


Hadoop FileSystem supports multiple FileSystem implementations awareness (e.g. 
client is aware of both hdfs:// and ofs:// protocols).

However, it seems that currently Hadoop TrashPolicy remains the same regardless 
of the URI scheme. The TrashPolicy is governed by "fs.trash.classname" 
configuration and stays the same regardless of the FileSystem implementation. 
For example, HDFS defaults to TrashPolicyDefault and Ozone defaults to 
TrashPolicyOzone, but only one will be picked since the the configuration will 
be overwritten by the other.

Therefore, I propose to couple the TrashPolicy for each specific FileSystem 
implementation by introducing a new FileSystem#getTrashPolicy. 
TrashPolicy#getInstance can call FileSystem#getTrashPolicy to get the 
appropriate TrashPolicy.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19216) Upgrade Guice from 4.0 to 5.1.0 to support Java 17

2024-07-01 Thread Cheng Pan (Jira)
Cheng Pan created HADOOP-19216:
--

 Summary: Upgrade Guice from 4.0 to 5.1.0 to support Java 17
 Key: HADOOP-19216
 URL: https://issues.apache.org/jira/browse/HADOOP-19216
 Project: Hadoop Common
  Issue Type: Task
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19215) Fix unit tests testSlowConnection and testBadSetup failed in TestRPC

2024-06-30 Thread farmmamba (Jira)
farmmamba created HADOOP-19215:
--

 Summary: Fix unit tests testSlowConnection and testBadSetup failed 
in TestRPC
 Key: HADOOP-19215
 URL: https://issues.apache.org/jira/browse/HADOOP-19215
 Project: Hadoop Common
  Issue Type: Bug
  Components: test
Affects Versions: 3.4.0
Reporter: farmmamba
Assignee: farmmamba


Fix unit tests testSlowConnection and testBadSetup failed in TestRPC.

We should use ProtobufRpcEngine2 ProtocolEngine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19214) Invalid GPG commands in Releases page

2024-06-29 Thread Attila Doroszlai (Jira)
Attila Doroszlai created HADOOP-19214:
-

 Summary: Invalid GPG commands in Releases page
 Key: HADOOP-19214
 URL: https://issues.apache.org/jira/browse/HADOOP-19214
 Project: Hadoop Common
  Issue Type: Bug
  Components: website
Reporter: Attila Doroszlai
Assignee: Attila Doroszlai


Instructions in [Download page|https://hadoop.apache.org/releases.html] shows 
GPG commands with {{--}} converted to , which makes the commands invalid.

{code}
gpg –import KEYS
gpg –verify hadoop-X.Y.Z-src.tar.gz.asc
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19213) testUpdateDeepDirectoryStructureToRemote intermittent failures

2024-06-28 Thread Pranav Saxena (Jira)
Pranav Saxena created HADOOP-19213:
--

 Summary: testUpdateDeepDirectoryStructureToRemote intermittent 
failures
 Key: HADOOP-19213
 URL: https://issues.apache.org/jira/browse/HADOOP-19213
 Project: Hadoop Common
  Issue Type: Bug
  Components: tools/distcp
Reporter: Pranav Saxena


Test testUpdateDeepDirectoryStructureToRemote intermittently fails. Following 
is an instance in ABFS test runs:


```
[ERROR] 
testUpdateDeepDirectoryStructureToRemote(org.apache.hadoop.fs.azurebfs.contract.ITestAbfsFileSystemContractDistCp)
 Time elapsed: 2.951 s <<< FAILURE!
java.lang.AssertionError: Files Copied value 2 above maximum 1
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.assertTrue(Assert.java:42)
at 
org.apache.hadoop.tools.contract.AbstractContractDistCpTest.assertCounterInRange(AbstractContractDistCpTest.java:294)
at 
org.apache.hadoop.tools.contract.AbstractContractDistCpTest.distCpUpdateDeepDirectoryStructure(AbstractContractDistCpTest.java:334)
at 
org.apache.hadoop.tools.contract.AbstractContractDistCpTest.testUpdateDeepDirectoryStructureToRemote(AbstractContractDistCpTest.java:259)
```

There is one JIRA in Apache-Ozone which was raised in S3 test run:
https://issues.apache.org/jira/browse/HDDS-10616

 

```
org.apache.hadoop.fs.contract.s3a.ITestS3AContractDistCp
testUpdateDeepDirectoryStructureToRemote(org.apache.hadoop.fs.contract.s3a.ITestS3AContractDistCp)
 Time elapsed: 2.375 s <<< FAILURE!
java.lang.AssertionError: Files Copied value 2 above maximum 1
at org.junit.Assert.fail(Assert.java:89)
at org.junit.Assert.assertTrue(Assert.java:42)
at 
org.apache.hadoop.tools.contract.AbstractContractDistCpTest.assertCounterInRange(AbstractContractDistCpTest.java:294)
at 
org.apache.hadoop.tools.contract.AbstractContractDistCpTest.distCpUpdateDeepDirectoryStructure(AbstractContractDistCpTest.java:334)
at 
org.apache.hadoop.tools.contract.AbstractContractDistCpTest.testUpdateDeepDirectoryStructureToRemote(AbstractContractDistCpTest.java:259)
```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19211) AliyunOSS: Support vectored read API

2024-06-27 Thread wujinhu (Jira)
wujinhu created HADOOP-19211:


 Summary: AliyunOSS: Support vectored read API
 Key: HADOOP-19211
 URL: https://issues.apache.org/jira/browse/HADOOP-19211
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/oss
Affects Versions: 3.3.6, 3.2.4
Reporter: wujinhu
Assignee: wujinhu






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19210) s3a: TestS3AAWSCredentialsProvider and TestS3AInputStreamRetry really slow

2024-06-26 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19210:
---

 Summary: s3a: TestS3AAWSCredentialsProvider and 
TestS3AInputStreamRetry really slow
 Key: HADOOP-19210
 URL: https://issues.apache.org/jira/browse/HADOOP-19210
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3, test
Affects Versions: 3.5.0
Reporter: Steve Loughran


Not noticed this before, but the unit tests TestS3AAWSCredentialsProvider and 
TestS3AInputStreamRetry are so slow they will be hurting over all test 
performance times: no integration tests will start until these are all complete.


{code}

mvn test -T 1C -Dparallel-tests

...
[INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 37.877 s 
- in org.apache.hadoop.fs.s3a.TestS3AInputStreamRetry
...
[INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 90.038 
s - in org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19194) Add test to find unshaded dependencies in the aws sdk

2024-06-24 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19194.
-
Fix Version/s: 3.4.1
   Resolution: Fixed

This highlights how many unshaded artifacts are in that bundle.jar, the one we 
use precisely to avoid classpath problems, especially with the aws sdk trying 
to dictate the jackson library

h2. Should we give up shipping it?

yes: it's tainted, things like netty are still there
no: at least jackson is shaded.

I'm not happy about slf4j or netty classes

> Add test to find unshaded dependencies in the aws sdk
> -
>
> Key: HADOOP-19194
> URL: https://issues.apache.org/jira/browse/HADOOP-19194
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Harshit Gupta
>Assignee: Harshit Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> Write a test to assess the aws sdk for unshaded artefacts on the class path 
> which might cause deployment failures. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19204) VectorIO regression: empty ranges are now rejected

2024-06-21 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19204.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> VectorIO regression: empty ranges are now rejected
> --
>
> Key: HADOOP-19204
> URL: https://issues.apache.org/jira/browse/HADOOP-19204
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs
>Affects Versions: 3.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> The validation now rejects a readvectored with an empty range, whereas before 
> it was a no-op
> Proposed fix, return the empty list; add test



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19206) Hadoop release contains a 530MB bundle-2.23.19.jar

2024-06-20 Thread Tsz-wo Sze (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tsz-wo Sze resolved HADOOP-19206.
-
Resolution: Duplicate

Resolving this as a duplicate of HADOOP-19083.

> Hadoop release contains a 530MB bundle-2.23.19.jar
> --
>
> Key: HADOOP-19206
> URL: https://issues.apache.org/jira/browse/HADOOP-19206
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: Tsz-wo Sze
>Priority: Major
>
> The size of Hadoop binary release (v3.4.0) is 1.7 GB.
> {code:java}
> hadoop-3.4.0$du -h -d 1
> $du -h -d 1 .
> 2.0M  ./bin
> 260K  ./libexec
>  72K  ./include
> 212K  ./sbin
> 184K  ./etc
> 232K  ./licenses-binary
> 316M  ./lib
> 1.4G  ./share
> 1.7G  .
> {code}
> A large component is bundle-2.23.19.jar, which is [AWS Java SDK :: 
> Bundle|https://mvnrepository.com/artifact/software.amazon.awssdk/bundle/2.23.19]
> {code:java}
> hadoop-3.4.0$ls -lh share/hadoop/tools/lib/bundle-2.23.19.jar  
> -rw-r--r--@ 1 szetszwo  staff   530M Mar  4 15:41 
> share/hadoop/tools/lib/bundle-2.23.19.jar
> {code}
> We should revisit if such a large jar is really needed to be included in the 
> release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19203) WrappedIO BulkDelete API to raise IOEs as UncheckedIOExceptions

2024-06-20 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19203.
-
Fix Version/s: 3.4.1
   Resolution: Fixed

> WrappedIO BulkDelete API to raise IOEs as UncheckedIOExceptions
> ---
>
> Key: HADOOP-19203
> URL: https://issues.apache.org/jira/browse/HADOOP-19203
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 3.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> It's easier to invoke methods through reflection through parquet/iceberg 
> DynMethods if the invoked method raises unchecked exceptions, because it 
> doesn't then rewrape the raised exception in a generic RuntimeException
> Catching the IOEs and wrapping as UncheckedIOEs makes it much easier to 
> unwrap IOEs after the invocation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19208) ABFS: Fixing logic to determine HNS nature of account to avoid extra getAcl() calls

2024-06-19 Thread Anuj Modi (Jira)
Anuj Modi created HADOOP-19208:
--

 Summary: ABFS: Fixing logic to determine HNS nature of account to 
avoid extra getAcl() calls
 Key: HADOOP-19208
 URL: https://issues.apache.org/jira/browse/HADOOP-19208
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.4.0
Reporter: Anuj Modi
 Fix For: 3.5.0, 3.4.1


ABFS driver needs to know the type of account being used. It relies on the user 
to inform the account type using the config `fs.azure.account.hns.enabled`.
If not configured, driver makes a getAcl call to determine the account type.

Expectation is getAcl() will fail with 400 Bad Request if made on the FNS 
Account.
For any other case including 200, 404 it will indicate account is HNS.

Today, when determining this, the logic only checks status code to be either 
200 or 400. In case of 404, nothing is inferred, and this leads to more getAcl 
again again untill 200 or 400 comes.

Fix is to update the logic such that if getAl() fails with 400, it is FNS 
account. For all other cases it will be an HNS account. In case of throttling, 
if all retries are exhausted, FS init itself will fail.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19207) ABFS: [FnsOverBlob]Response Handling of Blob Endpoint APIs and Metadata APIs

2024-06-18 Thread Anuj Modi (Jira)
Anuj Modi created HADOOP-19207:
--

 Summary: ABFS: [FnsOverBlob]Response Handling of Blob Endpoint 
APIs and Metadata APIs
 Key: HADOOP-19207
 URL: https://issues.apache.org/jira/browse/HADOOP-19207
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.4.0
Reporter: Anuj Modi
 Fix For: 3.5.0, 3.4.1


Blob Endpoint APIs has a different format for response than DFS Endpoint APIs.
There are some behavioral differences as well that need to be handled at client 
side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19206) Hadoop release contains a 530MB bundle-2.23.19.jar

2024-06-17 Thread Tsz-wo Sze (Jira)
Tsz-wo Sze created HADOOP-19206:
---

 Summary: Hadoop release contains a 530MB bundle-2.23.19.jar
 Key: HADOOP-19206
 URL: https://issues.apache.org/jira/browse/HADOOP-19206
 Project: Hadoop Common
  Issue Type: Improvement
  Components: build
Reporter: Tsz-wo Sze


The size of Hadoop binary release (v3.4.0) is 1.7 GB.
{code:java}
hadoop-3.4.0$du -h -d 1
$du -h -d 1 .
2.0M./bin
260K./libexec
 72K./include
212K./sbin
184K./etc
232K./licenses-binary
316M./lib
1.4G./share
1.7G.
{code}
A large component is bundle-2.23.19.jar, which is [AWS Java SDK :: 
Bundle|https://mvnrepository.com/artifact/software.amazon.awssdk/bundle/2.23.19]
{code:java}
hadoop-3.4.0$ls -lh share/hadoop/tools/lib/bundle-2.23.19.jar  
-rw-r--r--@ 1 szetszwo  staff   530M Mar  4 15:41 
share/hadoop/tools/lib/bundle-2.23.19.jar
{code}
We should revisit if such a large jar is really needed to be included in the 
release.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18508) support multiple s3a integration test runs on same bucket in parallel

2024-06-14 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18508.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> support multiple s3a integration test runs on same bucket in parallel
> -
>
> Key: HADOOP-18508
> URL: https://issues.apache.org/jira/browse/HADOOP-18508
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.3.9
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> to have (internal, sorry) jenkins test runs work in parallel, they need to 
> share the same bucket so
> # must have a prefix for job id which is passed in to the path used for forks
> # support disabling root tests so they don't stamp on each other



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18931) FileSystem.getFileSystemClass() to log at debug the jar the .class came from

2024-06-14 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18931.
-
Fix Version/s: 3.5.0
   3.4.1
 Assignee: Viraj Jasani
   Resolution: Fixed

> FileSystem.getFileSystemClass() to log at debug the jar the .class came from
> 
>
> Key: HADOOP-18931
> URL: https://issues.apache.org/jira/browse/HADOOP-18931
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 3.3.6
>Reporter: Steve Loughran
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> we want to be able to log the jar the filesystem implementation class, so 
> that we can identify which version of a module the class came from.
> this is to help track down problems where different machines in the cluster 
> or the .tar.gz bundle is out of date. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19192) Log level is WARN when fail to load native hadoop libs

2024-06-14 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19192.
-
Fix Version/s: 3.5.0
   3.4.1
 Assignee: Cheng Pan
   Resolution: Fixed

> Log level is WARN when fail to load native hadoop libs
> --
>
> Key: HADOOP-19192
> URL: https://issues.apache.org/jira/browse/HADOOP-19192
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.3.6
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19205) S3A initialization/close slower than with v1 SDK

2024-06-14 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19205:
---

 Summary: S3A initialization/close slower than with v1 SDK
 Key: HADOOP-19205
 URL: https://issues.apache.org/jira/browse/HADOOP-19205
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


Hive QE have observed slowdown in LLAP queries due to time to create and close 
s3a filesystems instances. A key aspect of that is they keep closing the fs 
instances (HIVE-27884), but looking at the profiles, the reason things seem to 
have regressed is

* two s3 clients are being created (sync and async)
* these seem to take a lot of time scanning the classpath for "global 
interceptors", which is at least an O(jars) operation; #of index entries in the 
zip files may factor too.

Proposed:
* create async client on demand when the transfer manager is invoked
* look at why passwords are being scanned for if 
InstanceProfileCredentialsProvider is in use...that seems slow too

SDK wishes
* SDK maybe allow us to turn off that scan for interceptors?

attaching screenshots of the profile. storediag snippet:
{code}

[001]  fs.s3a.access.key = (unset)
[002]  fs.s3a.secret.key = (unset)
[003]  fs.s3a.session.token = (unset)
[004]  fs.s3a.server-side-encryption-algorithm = (unset)
[005]  fs.s3a.server-side-encryption.key = (unset)
[006]  fs.s3a.encryption.algorithm = (unset)
[007]  fs.s3a.encryption.key = (unset)
[008]  fs.s3a.aws.credentials.provider = 
"com.amazonaws.auth.InstanceProfileCredentialsProvider" [core-site.xml]

{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19204) VectorIO regression: empty ranges are now rejected

2024-06-12 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19204:
---

 Summary: VectorIO regression: empty ranges are now rejected
 Key: HADOOP-19204
 URL: https://issues.apache.org/jira/browse/HADOOP-19204
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 3.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran


The validation now rejects a readvectored with an empty range, whereas before 
it was a no-op

Proposed fix, return the empty list; add test





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19203) WrappedIO BulkDelete API to raise iOEs as UncheckediOExceptions

2024-06-12 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19203:
---

 Summary: WrappedIO BulkDelete API to raise iOEs as 
UncheckediOExceptions
 Key: HADOOP-19203
 URL: https://issues.apache.org/jira/browse/HADOOP-19203
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Affects Versions: 3.4.1
Reporter: Steve Loughran



It's easier to invoke methods through reflection through parquet/iceberg 
DynMethods if the invoked method raises unchecked exceptions, because it 
doesn't then rewrape the raised exception in a generic RuntimeException

Catching the IOEs and wrapping as UncheckedIOEs makes it much easier to unwrap 
IOEs after the invocation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19196) Bulk delete api doesn't take the path to delete as the base path

2024-06-11 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19196?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-19196.

Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> Bulk delete api doesn't take the path to delete as the base path
> 
>
> Key: HADOOP-19196
> URL: https://issues.apache.org/jira/browse/HADOOP-19196
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs
>Affects Versions: 3.5.0, 3.4.1
>Reporter: Steve Loughran
>Assignee: Mukund Thakur
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> If you use the path of the file you intend to delete as the base path, you 
> get an error. This is because the validation requires the list to be of 
> children, but the base path itself should be valid.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19137) [ABFS]Prevent ABFS initialization for non-hierarchal-namespace account if Customer-provided-key configs given.

2024-06-11 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-19137.

Resolution: Fixed

> [ABFS]Prevent ABFS initialization for non-hierarchal-namespace account if 
> Customer-provided-key configs given.
> --
>
> Key: HADOOP-19137
> URL: https://issues.apache.org/jira/browse/HADOOP-19137
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> Store doesn't flow in the namespace information to the client. 
> In https://github.com/apache/hadoop/pull/6221, getIsNamespaceEnabled is added 
> in client methods which checks if namespace information is there or not, and 
> if not there, it will make getAcl call and set the field. Once the field is 
> set, it would be used in future getIsNamespaceEnabled method calls for a 
> given AbfsClient.
> Since, CPK both global and encryptionContext are only for hns account, the 
> fix that is proposed is that we would fail fs init if its non-hns account and 
> cpk config is given.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19202) EC: Support decommissioning DataNode by EC block reconstruction

2024-06-11 Thread Chenyu Zheng (Jira)
Chenyu Zheng created HADOOP-19202:
-

 Summary: EC: Support decommissioning DataNode by EC block 
reconstruction
 Key: HADOOP-19202
 URL: https://issues.apache.org/jira/browse/HADOOP-19202
 Project: Hadoop Common
  Issue Type: Improvement
Reporter: Chenyu Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19199) Include FileStatus when opening a file from FileSystem

2024-06-10 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19199.
-
Resolution: Duplicate

Closing as a duplicate of HADOOP-15229. 

I absolutely agree the head request are needless. Which is why we added exactly 
the feature you wanted in 2019, *five years ago*. And in HADOOP-16202, you only 
need to pass in the file length, so if you can store that in your manifests, 
then you can skip the HEAD call (s3a; abfs still needs it).

The problem we have is therefore not that Hadoop library lacks this, it is that 
libraries and applications haven't taken it up. Why not? Because they want 
compile against versions of duke that are over 10 years old. Which means that 
all improvements we have done that are wasted. Although private forks can do 
this, it's a very hard to get this taken up consistently, and people like you 
and I suffer in wasted time and money.

What can be done? Well, I have concluded that trying to get the projects 
upgrade doesn't work, and waiting for the libraries to "get up-to-date" is a 
moving target as we are always trying to improve in this area. Instead, all our 
new work is being targeted at being "reflection-friendly" and expecting the 
initial take-up to be through reflection. In HADOOP-19131 I am exporting the 
existing openFile() API (which takes a builder and returns and asynchronously 
evaluated input stream) as an easy-to-reflect function

{code}
public static FSDataInputStream fileSystem_openFile(
  final FileSystem fs,
  final Path path,
  final String policy,
  final FileStatus status,
  final Long length,
  final Map options) throws IOException {
{code}

The "policy" is also critical as it tells the storage layer what access policy 
you want, such as random or sequential. I'm going to add an explicit "parquet" 
policy here too, which hence to the library that footer caching would be good.

What can you do then? Other than just waiting for this to happen? Help us get 
this through the stack. We need it in: parquet, iceberg, spark, avro. 

Can you start by reviewing HADOOP-19131 and seeing how well you think it will 
integrate *and anything you can do in terms of Proof of Concept PRs using this 
patch*, so we can identify problems before the hadoop patch is merged.


> Include FileStatus when opening a file from FileSystem
> --
>
> Key: HADOOP-19199
> URL: https://issues.apache.org/jira/browse/HADOOP-19199
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 3.4.0
>Reporter: Oliver Caballero Alvarez
>Priority: Major
>  Labels: pull-request-available
>
> The FileSystem abstract class prevents that if you have information about the 
> FileStatus of a file, you use it to open that file, which means that in the 
> implementations of the open method, they have to request the FileStatus of 
> the same file again, making unnecessary requests.
> A very clear example is seen in today's latest version of the parquet-hadoop 
> implementation, where:
> https://github.com/apache/parquet-java/blob/apache-parquet-1.14.0/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopInputFile.java
> Although to create the implementation you had to consult the file to know its 
> FileStatus, when opening it only the path is included, since the FileSystem 
> implementation is the only thing it allows you to do. This implies that the 
> implementation will surely, in its open function, verify that the file exists 
> or what information the file has and perform the same operation again to 
> collect the FileStatus.
>  
> This would simply be resolved by taking the latest current version:
>  
> [https://github.com/apache/hadoop/blob/release-3.4.0-RC3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java]
> and including the following:
>  
>   public FSDataInputStream open(FileStatus f) throws IOException {
>         return this.open(f.getPath(), 
> this.getConf().getInt("io.file.buffer.size", 4096));
>     }
>  
> This would imply that it is backward compatible with all current Filesystems, 
> but since it is in the implementation it could be used when this information 
> is already known.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19200) Reduce the number of headObject when opening a file with the s3 file system

2024-06-10 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19200.
-
Resolution: Duplicate

> Reduce the number of headObject when opening a file with the s3 file system
> ---
>
> Key: HADOOP-19200
> URL: https://issues.apache.org/jira/browse/HADOOP-19200
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.4.0, 3.3.6
>Reporter: Oliver Caballero Alvarez
>Priority: Major
>
> In the implementation of the S3 filesystem, of the hadoop aws package, if you 
> use it with spark, every time you open a file for anything you will have to 
> send two Head Objects, since to open the file, you will first look to see if 
> this file exists, executing a HeadObject, and then when opening it, the 
> implementation, both of sdk1 and sdk2, forces you to make a head object 
> again. This is not the fault of the implementation of this class 
> (S3AFileSystem), but of the abstract FileSystem class of the Hadoop core, 
> since it does not allow the FileStatus to be passed but only allows the use 
> of Path.
> If the FileSystem implementation is changed, it could be used to not have to 
> request that HeadObject again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19201) Support external id in assume role

2024-06-08 Thread Smith Cruise (Jira)
Smith Cruise created HADOOP-19201:
-

 Summary: Support external id in assume role
 Key: HADOOP-19201
 URL: https://issues.apache.org/jira/browse/HADOOP-19201
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Smith Cruise
 Fix For: 3.4.1


Support external id in AssumedRoleCredentialProvider.java



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19200) Reduce the number of headObject when opening a file with the s3 file system

2024-06-08 Thread Oliver Caballero Alvarez (Jira)
Oliver Caballero Alvarez created HADOOP-19200:
-

 Summary: Reduce the number of headObject when opening a file with 
the s3 file system
 Key: HADOOP-19200
 URL: https://issues.apache.org/jira/browse/HADOOP-19200
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Affects Versions: 3.3.6, 3.4.0
Reporter: Oliver Caballero Alvarez


In the implementation of the S3 filesystem, of the hadoop aws package, if you 
use it with spark, every time you open a file for anything you will have to 
send two Head Objects, since to open the file, you will first look to see if 
this file exists, executing a HeadObject, and then when opening it, the 
implementation, both of sdk1 and sdk2, forces you to make a head object again. 
This is not the fault of the implementation of this class (S3AFileSystem), but 
of the abstract FileSystem class of the Hadoop core, since it does not allow 
the FileStatus to be passed but only allows the use of Path.

If the FileSystem implementation is changed, it could be used to not have to 
request that HeadObject again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19199) Include FileStatus when opening a file from FileSystem

2024-06-08 Thread Oliver Caballero Alvarez (Jira)
Oliver Caballero Alvarez created HADOOP-19199:
-

 Summary: Include FileStatus when opening a file from FileSystem
 Key: HADOOP-19199
 URL: https://issues.apache.org/jira/browse/HADOOP-19199
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Affects Versions: 3.4.0
Reporter: Oliver Caballero Alvarez


The FileSystem abstract class prevents that if you have information about the 
FileStatus of a file, you use it to open that file, which means that in the 
implementations of the open method, they have to request the FileStatus of the 
same file again, making unnecessary requests.

A very clear example is seen in today's latest version of the parquet-hadoop 
implementation, where:

https://github.com/apache/parquet-java/blob/apache-parquet-1.14.0/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopInputFile.java

Although to create the implementation you had to consult the file to know its 
FileStatus, when opening it only the path is included, since the FileSystem 
implementation is the only thing it allows you to do. This implies that the 
implementation will surely, in its open function, verify that the file exists 
or what information the file has and perform the same operation again to 
collect the FileStatus.

 

This would simply be resolved by taking the latest current version:

 

[https://github.com/apache/hadoop/blob/release-3.4.0-RC3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java]

and including the following:

 

  public FSDataInputStream open(FileStatus f) throws IOException {
        return this.open(f.getPath(), 
this.getConf().getInt("io.file.buffer.size", 4096));
    }

 

This would imply that it is backward compatible with all current Filesystems, 
but since it is in the implementation it could be used when this information is 
already known.

 

 

 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18516) [ABFS]: Support fixed SAS token config in addition to Custom SASTokenProvider Implementation

2024-06-07 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18516.
-
Fix Version/s: 3.4.1
   Resolution: Fixed

> [ABFS]: Support fixed SAS token config in addition to Custom SASTokenProvider 
> Implementation
> 
>
> Key: HADOOP-18516
> URL: https://issues.apache.org/jira/browse/HADOOP-18516
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Sree Bhattacharyya
>Assignee: Anuj Modi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> This PR introduces a new configuration for Fixed SAS Tokens: 
> *"fs.azure.sas.fixed.token"*
> Using this new configuration, users can configure a fixed SAS Token in the 
> account settings files itself. Ideally, this should be used with SAS Tokens 
> that are scoped at a container or account level (Service or Account SAS), 
> which can be considered to be a constant for one account or container, over 
> multiple operations.
> The other method of using a SAS Token remains valid as well, where a user 
> provides a custom implementation of the SASTokenProvider interface, using 
> which a SAS Token are obtained.
> When an Account SAS Token is configured as the fixed SAS Token, and it is 
> used, it is ensured that operations are within the scope of the SAS Token.
> The code checks for whether the fixed token and the token provider class 
> implementation are configured. In the case of both being set, preference is 
> given to the custom SASTokenProvider implementation. It must be noted that if 
> such an implementation provides a SAS Token which has a lower scope than 
> Account SAS, some filesystem and service level operations might be out of 
> scope and may not succeed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19178) WASB Driver Deprecation and eventual removal

2024-06-07 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19178.
-
Fix Version/s: 3.3.9
   3.5.0
 Assignee: Anuj Modi  (was: Sneha Vijayarajan)
   Resolution: Fixed

> WASB Driver Deprecation and eventual removal
> 
>
> Key: HADOOP-19178
> URL: https://issues.apache.org/jira/browse/HADOOP-19178
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Sneha Vijayarajan
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> *WASB Driver*
> WASB driver was developed to support FNS (FlatNameSpace) Azure Storage 
> accounts. FNS accounts do not honor File-Folder syntax. HDFS Folder 
> operations hence are mimicked at client side by WASB driver and certain 
> folder operations like Rename and Delete can lead to lot of IOPs with 
> client-side enumeration and orchestration of rename/delete operation blob by 
> blob. It was not ideal for other APIs too as initial checks for path is a 
> file or folder needs to be done over multiple metadata calls. These led to a 
> degraded performance.
> To provide better service to Analytics customers, Microsoft released ADLS 
> Gen2 which are HNS (Hierarchical Namespace) , i.e File-Folder aware store. 
> ABFS driver was designed to overcome the inherent deficiencies of WASB and 
> customers were informed to migrate to ABFS driver.
> *Customers who still use the legacy WASB driver and the challenges they face* 
> Some of our customers have not migrated to the ABFS driver yet and continue 
> to use the legacy WASB driver with FNS accounts.  
> These customers face the following challenges: 
>  * They cannot leverage the optimizations and benefits of the ABFS driver.
>  * They need to deal with the compatibility issues should the files and 
> folders were modified with the legacy WASB driver and the ABFS driver 
> concurrently in a phased transition situation.
>  * There are differences for supported features for FNS and HNS over ABFS 
> Driver
>  * In certain cases, they must perform a significant amount of re-work on 
> their workloads to migrate to the ABFS driver, which is available only on HNS 
> enabled accounts in a fully tested and supported scenario.
> *Deprecation plans for WASB*
> We are introducing a new feature that will enable the ABFS driver to support 
> FNS accounts (over BlobEndpoint) using the ABFS scheme. This feature will 
> enable customers to use the ABFS driver to interact with data stored in GPv2 
> (General Purpose v2) storage accounts. 
> With this feature, the customers who still use the legacy WASB driver will be 
> able to migrate to the ABFS driver without much re-work on their workloads. 
> They will however need to change the URIs from the WASB scheme to the ABFS 
> scheme. 
> Once ABFS driver has built FNS support capability to migrate WASB customers, 
> WASB driver will be declared deprecated in OSS documentation and marked for 
> removal in next major release. This will remove any ambiguity for new 
> customer onboards as there will be only one Microsoft driver for Azure 
> Storage and migrating customers will get SLA bound support for driver and 
> service, which was not guaranteed over WASB.
>  We anticipate that this feature will serve as a stepping stone for customers 
> to move to HNS enabled accounts with the ABFS driver, which is our 
> recommended stack for big data analytics on ADLS Gen2. 
> *Any Impact for* *existing customers who are using ADLS Gen2 (HNS enabled 
> account) with ABFS driver* *?*
> This feature does not impact the existing customers who are using ADLS Gen2 
> (HNS enabled account) with ABFS driver.
> They do not need to make any changes to their workloads or configurations. 
> They will still enjoy the benefits of HNS, such as atomic operations, 
> fine-grained access control, scalability, and performance. 
> *Official recommendation*
> Microsoft continues to recommend all Big Data and Analytics customers to use 
> Azure Data Lake Gen2 (ADLS Gen2) using the ABFS driver and will continue to 
> optimize this scenario in future, we believe that this new option will help 
> all those customers to transition to a supported scenario immediately, while 
> they plan to ultimately move to ADLS Gen2 (HNS enabled account).
>  *New Authentication options that a WASB to ABFS Driver migrating customer 
> will get*
> Below auth types that WASB provides will continue 

[jira] [Resolved] (HADOOP-19114) upgrade to commons-compress 1.26.1 due to cves

2024-06-07 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19114.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> upgrade to commons-compress 1.26.1 due to cves
> --
>
> Key: HADOOP-19114
> URL: https://issues.apache.org/jira/browse/HADOOP-19114
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build, CVE
>Affects Versions: 3.4.0
>Reporter: PJ Fanning
>Assignee: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> 2 recent CVEs fixed - 
> https://mvnrepository.com/artifact/org.apache.commons/commons-compress
> Important: Denial of Service CVE-2024-25710
> Moderate: Denial of Service CVE-2024-26308



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19197) S3A: Support AWS KMS Encryption Context

2024-06-06 Thread Raphael Azzolini (Jira)
Raphael Azzolini created HADOOP-19197:
-

 Summary: S3A: Support AWS KMS Encryption Context
 Key: HADOOP-19197
 URL: https://issues.apache.org/jira/browse/HADOOP-19197
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Raphael Azzolini


S3A properties allow users to choose the AWS KMS key 
({_}fs.s3a.encryption.key{_}) and S3 encryption algorithm to be used 
(f{_}s.s3a.encryption.algorithm{_}). In addition to the AWS KMS Key, an 
encryption context can be used as non-secret data that adds additional 
integrity and authenticity to check the encrypted data. However, there is no 
option to specify the [AWS KMS Encryption 
Context|https://docs.aws.amazon.com/kms/latest/developerguide/concepts.html#encrypt_context]
 in S3A.

In AWS SDK v2 the encryption context in S3 requests is set by the parameter 
[ssekmsEncryptionContext.|https://sdk.amazonaws.com/java/api/latest/software/amazon/awssdk/services/s3/model/CreateMultipartUploadRequest.Builder.html#ssekmsEncryptionContext(java.lang.String)]
 It receives a base64-encoded UTF-8 string holding JSON with the encryption 
context key-value pairs. The value of this parameter could be set by the user 
in a new property {_}*fs.s3a.encryption.context*{_}, and be stored in the 
[EncryptionSecrets|https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/auth/delegation/EncryptionSecrets.java]
 to later be used when setting the encryption parameters in 
[RequestFactoryImpl|https://github.com/apache/hadoop/blob/f92a8ab8ae54f11946412904973eb60404dee7ff/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/RequestFactoryImpl.java].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19196) Bulk delete api doesn't take the path to delete as the base path

2024-06-06 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19196:
---

 Summary: Bulk delete api doesn't take the path to delete as the 
base path
 Key: HADOOP-19196
 URL: https://issues.apache.org/jira/browse/HADOOP-19196
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.5.0, 3.4.1
Reporter: Steve Loughran


If you use the path of the file you intend to delete as the base path, you get 
an error. This is because the validation requires the list to be of children, 
but the base path itself should be valid.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19195) Upgrade aws sdk v2 to 2.25.53

2024-06-06 Thread Harshit Gupta (Jira)
Harshit Gupta created HADOOP-19195:
--

 Summary: Upgrade aws sdk v2 to 2.25.53
 Key: HADOOP-19195
 URL: https://issues.apache.org/jira/browse/HADOOP-19195
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Affects Versions: 3.5.0, 3.4.1
Reporter: Harshit Gupta
Assignee: Harshit Gupta
 Fix For: 3.5.0


Upgrade aws sdk v2 to 2.25.53



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19194) Add test to find unshaded dependencies in the aws sdk

2024-06-06 Thread Harshit Gupta (Jira)
Harshit Gupta created HADOOP-19194:
--

 Summary: Add test to find unshaded dependencies in the aws sdk
 Key: HADOOP-19194
 URL: https://issues.apache.org/jira/browse/HADOOP-19194
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Harshit Gupta
Assignee: Harshit Gupta
 Fix For: 3.4.1


Write a test to assess the aws sdk for unshaded artefacts on the class path 
which might cause deployment failures. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19193) Create orphan commit for website deployment

2024-06-05 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19193.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> Create orphan commit for website deployment
> ---
>
> Key: HADOOP-19193
> URL: https://issues.apache.org/jira/browse/HADOOP-19193
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19193) Create orphan commit for website deployment

2024-06-05 Thread Cheng Pan (Jira)
Cheng Pan created HADOOP-19193:
--

 Summary: Create orphan commit for website deployment
 Key: HADOOP-19193
 URL: https://issues.apache.org/jira/browse/HADOOP-19193
 Project: Hadoop Common
  Issue Type: Improvement
  Components: documentation
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19192) Log level is WARN when fail to load native hadoop libs

2024-06-05 Thread Cheng Pan (Jira)
Cheng Pan created HADOOP-19192:
--

 Summary: Log level is WARN when fail to load native hadoop libs
 Key: HADOOP-19192
 URL: https://issues.apache.org/jira/browse/HADOOP-19192
 Project: Hadoop Common
  Issue Type: Improvement
  Components: documentation
Affects Versions: 3.3.6
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19188) TestHarFileSystem and TestFilterFileSystem failing after bulk delete API added

2024-06-04 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19188.
-
Resolution: Fixed

> TestHarFileSystem and TestFilterFileSystem failing after bulk delete API added
> --
>
> Key: HADOOP-19188
> URL: https://issues.apache.org/jira/browse/HADOOP-19188
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs, test
>Affects Versions: 3.5.0, 3.4.1
>Reporter: Steve Loughran
>Assignee: Mukund Thakur
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> oh, we need to update a couple of tests so they know not to worry about the 
> new interface/method. The details are in the javadocs of FileSystem.
> Interesting these snuck through yetus, though they fail in PRs based atop 
> #6726
> {code}
> [ERROR] Failures: 
> [ERROR] org.apache.hadoop.fs.TestFilterFileSystem.testFilterFileSystem
> [ERROR]   Run 1: TestFilterFileSystem.testFilterFileSystem:181 1 methods were 
> not overridden correctly - see log
> [ERROR]   Run 2: TestFilterFileSystem.testFilterFileSystem:181 1 methods were 
> not overridden correctly - see log
> [ERROR]   Run 3: TestFilterFileSystem.testFilterFileSystem:181 1 methods were 
> not overridden correctly - see log
> [INFO] 
> [ERROR] org.apache.hadoop.fs.TestHarFileSystem.testInheritedMethodsImplemented
> [ERROR]   Run 1: TestHarFileSystem.testInheritedMethodsImplemented:402 1 
> methods were not overridden correctly - see log
> [ERROR]   Run 2: TestHarFileSystem.testInheritedMethodsImplemented:402 1 
> methods were not overridden correctly - see log
> [ERROR]   Run 3: TestHarFileSystem.testInheritedMethodsImplemented:402 1 
> methods were not overridden correctly - see log
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19191) Batch APIs for delete

2024-06-04 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19191.
-
Resolution: Duplicate

Fixed in HADOOP-18679; there's an iceberg PR up to use the reflection-friendly 
WrappedIO access point.

That feature will ship in hadoop 3.4.1; i would like a basic backport to 
branch-3.3 where even though the full s3a-side backport would be impossible 
(sdk versions...), we could at least offer the public API to all and the 
page-size=1 DELETE call for S3, *without any safety checks*. it'll still save 
some LIST calls and encourage adoption.

If you want to get involved there, happy to take PRs (under the original JIRA)

> Batch APIs for delete
> -
>
> Key: HADOOP-19191
> URL: https://issues.apache.org/jira/browse/HADOOP-19191
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Reporter: Alkis Evlogimenos
>Priority: Major
>
> Add batch APIs with for delete to allow better performance for object stores:
> {{boolean[] delete(Path[] paths);}}
> The API should have a default implementation that delegates to the singular 
> delete. Implementations can override to provide better performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19191) Batch APIs for delete

2024-06-04 Thread Alkis Evlogimenos (Jira)
Alkis Evlogimenos created HADOOP-19191:
--

 Summary: Batch APIs for delete
 Key: HADOOP-19191
 URL: https://issues.apache.org/jira/browse/HADOOP-19191
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Reporter: Alkis Evlogimenos


Add batch APIs with for delete to allow better performance for object stores:

{{boolean[] delete(Path[] paths);}}

The API should have a default implementation that delegates to the singular 
delete. Implementations can override to provide better performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19190) Skip ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes when bucket not encrypted with sse-kms

2024-06-03 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19190?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-19190.

Resolution: Fixed

> Skip ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes 
> when bucket not encrypted with sse-kms
> 
>
> Key: HADOOP-19190
> URL: https://issues.apache.org/jira/browse/HADOOP-19190
> Project: Hadoop Common
>  Issue Type: Test
>  Components: fs/s3
>Affects Versions: 3.4.1
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.4.1
>
>
> [ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 4, Time elapsed: 12.80 
> s <<< FAILURE! -- in 
> org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings
> [ERROR] 
> org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes
>  -- Time elapsed: 5.065 s <<< FAILURE!
> org.junit.ComparisonFailure: [Server side encryption algorithm must match] 
> expected:<"[aws:kms]"> but was:<"[AES256]">
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native 
> Method)
> at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at 
> org.apache.hadoop.fs.s3a.EncryptionTestUtils.validateEncryptionFileAttributes(EncryptionTestUtils.java:138)
> at 
> org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes(ITestS3AEncryptionWithDefaultS3Settings.java:112)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:498)
> at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19190) Skip ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes when bucket not encrypted with sse-kms

2024-05-31 Thread Mukund Thakur (Jira)
Mukund Thakur created HADOOP-19190:
--

 Summary: Skip 
ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes when 
bucket not encrypted with sse-kms
 Key: HADOOP-19190
 URL: https://issues.apache.org/jira/browse/HADOOP-19190
 Project: Hadoop Common
  Issue Type: Test
  Components: fs/s3
Affects Versions: 3.4.1
Reporter: Mukund Thakur
Assignee: Mukund Thakur


[ERROR] Tests run: 5, Failures: 1, Errors: 0, Skipped: 4, Time elapsed: 12.80 s 
<<< FAILURE! -- in 
org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings
[ERROR] 
org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes
 -- Time elapsed: 5.065 s <<< FAILURE!
org.junit.ComparisonFailure: [Server side encryption algorithm must match] 
expected:<"[aws:kms]"> but was:<"[AES256]">
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at 
org.apache.hadoop.fs.s3a.EncryptionTestUtils.validateEncryptionFileAttributes(EncryptionTestUtils.java:138)
at 
org.apache.hadoop.fs.s3a.ITestS3AEncryptionWithDefaultS3Settings.testEncryptionFileAttributes(ITestS3AEncryptionWithDefaultS3Settings.java:112)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19189) ITestS3ACommitterFactory failing

2024-05-31 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19189:
---

 Summary: ITestS3ACommitterFactory failing
 Key: HADOOP-19189
 URL: https://issues.apache.org/jira/browse/HADOOP-19189
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3, test
Affects Versions: 3.4.0
Reporter: Steve Loughran


we've had ITestS3ACommitterFactory failing for a while, where it looks like 
changed committer settings aren't being picked up.

{code}
ERROR] ITestS3ACommitterFactory.testEverything:115->testInvalidFileBinding:165 
Expected a org.apache.hadoop.fs.s3a.commit.PathCommitException to be thrown, 
but got the result: : 
FileOutputCommitter{PathOutputCommitter{context=TaskAttemptContextImpl{JobContextImpl
{code}

I've spent some time looking at it and it is happening because the test sets 
the fileystem ref for the local test fs, and not that of the filesystem created 
by the committer, which is where the option is picked up.

i've tried to parameterize it but things are still playing up and I'm not sure 
how hard to try to fix.






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19156) ZooKeeper based state stores use different ZK address configs

2024-05-29 Thread Xiaoqiao He (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoqiao He resolved HADOOP-19156.
--
Fix Version/s: 3.5.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> ZooKeeper based state stores use different ZK address configs
> -
>
> Key: HADOOP-19156
> URL: https://issues.apache.org/jira/browse/HADOOP-19156
> Project: Hadoop Common
>  Issue Type: Improvement
>Reporter: liu bin
>Assignee: liu bin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Currently, the Zookeeper-based state stores of RM, YARN Federation, and HDFS 
> Federation use the same ZK address config `{{{}hadoop.zk.address`{}}}. But in 
> our production environment, we hope that different services can use different 
> ZKs to avoid mutual influence.
> This jira adds separate ZK address configs for each service.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18679) Add API for bulk/paged delete of files and objects

2024-05-28 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-18679.

Resolution: Fixed

> Add API for bulk/paged delete of files and objects
> --
>
> Key: HADOOP-18679
> URL: https://issues.apache.org/jira/browse/HADOOP-18679
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.5
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1
>
>
> iceberg and hbase could benefit from being able to give a list of individual 
> files to delete -files which may be scattered round the bucket for better 
> read peformance. 
> Add some new optional interface for an object store which allows a caller to 
> submit a list of paths to files to delete, where
> the expectation is
> * if a path is a file: delete
> * if a path is a dir, outcome undefined
> For s3 that'd let us build these into DeleteRequest objects, and submit, 
> without any probes first.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19184) TestStagingCommitter.testJobCommitFailure failing

2024-05-28 Thread Mukund Thakur (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mukund Thakur resolved HADOOP-19184.

Fix Version/s: 3.4.1
   Resolution: Fixed

> TestStagingCommitter.testJobCommitFailure failing 
> --
>
> Key: HADOOP-19184
> URL: https://issues.apache.org/jira/browse/HADOOP-19184
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.4.1
>
>
> {code:java}
> [INFO] 
> [ERROR] Failures: 
> [ERROR]   TestStagingCommitter.testJobCommitFailure:662 [Committed objects 
> compared to deleted paths 
> org.apache.hadoop.fs.s3a.commit.staging.StagingTestBase$ClientResults@1b4ab85{
>  requests=12, uploads=12, parts=12, tagsByUpload=12, commits=5, aborts=7, 
> deletes=0}] 
> Expecting:
>   
> <["s3a://bucket-name/output/path/r_0_0_0e1f4790-4d3f-4abb-ba98-2b39ec8b7566",
>     
> "s3a://bucket-name/output/path/r_0_0_92306fea-0219-4ba5-a2b6-091d95547c11",
>     
> "s3a://bucket-name/output/path/r_1_1_016c4a25-a1f7-4e01-918e-e24a32c7525f",
>     
> "s3a://bucket-name/output/path/r_0_0_b2698dab-5870-4bdb-98ab-0ef5832eca45",
>     
> "s3a://bucket-name/output/path/r_1_1_600b7e65-a7ff-4d07-b763-c4339a9164ad"]>
> to contain exactly in any order:
>   <[]>
> but the following elements were unexpected:
>   
> <["s3a://bucket-name/output/path/r_0_0_0e1f4790-4d3f-4abb-ba98-2b39ec8b7566",
>     
> "s3a://bucket-name/output/path/r_0_0_92306fea-0219-4ba5-a2b6-091d95547c11",
>     
> "s3a://bucket-name/output/path/r_1_1_016c4a25-a1f7-4e01-918e-e24a32c7525f",
>     
> "s3a://bucket-name/output/path/r_0_0_b2698dab-5870-4bdb-98ab-0ef5832eca45",{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19188) TestHarFileSystem and TestFilterFileSystem failing after bulk delete API added

2024-05-27 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19188:
---

 Summary: TestHarFileSystem and TestFilterFileSystem failing after 
bulk delete API added
 Key: HADOOP-19188
 URL: https://issues.apache.org/jira/browse/HADOOP-19188
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs, test
Affects Versions: 3.5.0
Reporter: Steve Loughran
Assignee: Mukund Thakur


oh, we need to update a couple of tests so they know not to worry about the new 
interface/method. The details are in the javadocs of FileSystem.

Interesting these snuck through yetus, though they fail in PRs based atop #6726

{code}
[ERROR] Failures: 
[ERROR] org.apache.hadoop.fs.TestFilterFileSystem.testFilterFileSystem
[ERROR]   Run 1: TestFilterFileSystem.testFilterFileSystem:181 1 methods were 
not overridden correctly - see log
[ERROR]   Run 2: TestFilterFileSystem.testFilterFileSystem:181 1 methods were 
not overridden correctly - see log
[ERROR]   Run 3: TestFilterFileSystem.testFilterFileSystem:181 1 methods were 
not overridden correctly - see log
[INFO] 
[ERROR] org.apache.hadoop.fs.TestHarFileSystem.testInheritedMethodsImplemented
[ERROR]   Run 1: TestHarFileSystem.testInheritedMethodsImplemented:402 1 
methods were not overridden correctly - see log
[ERROR]   Run 2: TestHarFileSystem.testInheritedMethodsImplemented:402 1 
methods were not overridden correctly - see log
[ERROR]   Run 3: TestHarFileSystem.testInheritedMethodsImplemented:402 1 
methods were not overridden correctly - see log

{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19187) ABFS: Making AbfsClient Abstract for supporting both DFS and Blob Endpoint

2024-05-27 Thread Anuj Modi (Jira)
Anuj Modi created HADOOP-19187:
--

 Summary: ABFS: Making AbfsClient Abstract for supporting both DFS 
and Blob Endpoint
 Key: HADOOP-19187
 URL: https://issues.apache.org/jira/browse/HADOOP-19187
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Affects Versions: 3.4.0
Reporter: Anuj Modi
Assignee: Anuj Modi
 Fix For: 3.5.0, 3.4.1


Azure Services support two different set of APIs.
Blob: 
[https://learn.microsoft.com/en-us/rest/api/storageservices/blob-service-rest-api]
 
DFS: 
[https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/operation-groups]
 

As per the plan in HADOOP-19179, this task enables ABFS Driver to work with 
both set of APIs as per the requirement.

Scope of this task is to refactor the ABfsClient so that ABFSStore can choose 
to interact with the client it wants based on the endpoint configured by user.

The blob endpoint support will remain "Unsupported" until the whole code is 
checked-in and well tested.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18962) Upgrade kafka to 3.4.0

2024-05-24 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18962.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> Upgrade kafka to 3.4.0
> --
>
> Key: HADOOP-18962
> URL: https://issues.apache.org/jira/browse/HADOOP-18962
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: D M Murali Krishna Reddy
>Assignee: D M Murali Krishna Reddy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Upgrade kafka-clients to 3.4.0 to fix 
> https://nvd.nist.gov/vuln/detail/CVE-2023-25194



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19186) Change loglevel to ERROR/WARNING so that it would easy to identify the problem without ignoring

2024-05-24 Thread Srinivasu Majeti (Jira)
Srinivasu Majeti created HADOOP-19186:
-

 Summary: Change loglevel to ERROR/WARNING so that it would easy to 
identify the problem without ignoring
 Key: HADOOP-19186
 URL: https://issues.apache.org/jira/browse/HADOOP-19186
 Project: Hadoop Common
  Issue Type: Improvement
  Components: security
Reporter: Srinivasu Majeti


On the new Host with Java version 11, the DN was not able to communicate with 
the NN. We enabled DEBUG logging for the DN and the below message was logged 
under DEBUG level.

DEBUG org.apache.hadoop.security.UserGroupInformation: 
PrivilegedActionException as:hdfs/av3l704p.bigdata.it.internal@PRODUCTION.LOCAL 
(auth:KERBEROS) cause:javax.security.sasl.SaslExcept
ion: GSS initiate failed [Caused by GSSException: No valid credentials provided 
(Mechanism level: Receive timed out)]

Without a DEBUG level logging, this was shown up as a WARNING as below

WARN org.apache.hadoop.ipc.Client: Couldn't setup connection for 
hdfs/av3l704p.bigdata.it.internal@PRODUCTION.LOCAL to 
avl2785p.bigdata.it.internal/172.24.178.32:8022
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Receive timed out)]

A considerable amount of time was spent troubleshooting this issue as this 
exception was moved to a DEBUG level which was difficult to track in the logs.

Can we have such critical WARNINGs shown up at the WARN/ERROR level so that 
it's not missed when we enable DEBUG level logging for datanodes?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19168) Upgrade Kafka Clients due to CVEs

2024-05-23 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19168.
-
Resolution: Duplicate

rohit, dupe of HADOOP-18962. let's focus on that

> Upgrade Kafka Clients due to CVEs
> -
>
> Key: HADOOP-19168
> URL: https://issues.apache.org/jira/browse/HADOOP-19168
> Project: Hadoop Common
>  Issue Type: Task
>Reporter: Rohit Kumar
>Priority: Major
>  Labels: pull-request-available
>
> Upgrade Kafka Clients due to CVEs
> CVE-2023-25194:- Affected versions of this package are vulnerable to 
> Deserialization of Untrusted Data when there are gadgets in the 
> {{{}classpath{}}}. The server will connect to the attacker's LDAP server and 
> deserialize the LDAP response, which the attacker can use to execute java 
> deserialization gadget chains on the Kafka connect server.
> CVSS Score:- 8.8(High)
> [https://nvd.nist.gov/vuln/detail/CVE-2023-25194] 
> CVE-2021-38153
> CVE-2018-17196
> Insufficient Entropy
> [https://security.snyk.io/package/maven/org.apache.kafka:kafka-clients] 
> Upgrade Kafka-Clients to 3.4.0 or higher.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19182) Upgrade kafka to 3.4.0

2024-05-23 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19182.
-
Resolution: Duplicate

> Upgrade kafka to 3.4.0
> --
>
> Key: HADOOP-19182
> URL: https://issues.apache.org/jira/browse/HADOOP-19182
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Reporter: fuchaohong
>Priority: Major
>  Labels: pull-request-available
>
> Upgrade kafka to 3.4.0 to resolve CVE-2023-25194



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19185) Improve ABFS metric integration with iOStatistics

2024-05-23 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19185:
---

 Summary: Improve ABFS metric integration with iOStatistics
 Key: HADOOP-19185
 URL: https://issues.apache.org/jira/browse/HADOOP-19185
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Reporter: Steve Loughran


Followup to HADOOP-18325 covering the outstanding comments of

https://github.com/apache/hadoop/pull/6314/files





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18325) ABFS: Add correlated metric support for ABFS operations

2024-05-23 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18325.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> ABFS: Add correlated metric support for ABFS operations
> ---
>
> Key: HADOOP-18325
> URL: https://issues.apache.org/jira/browse/HADOOP-18325
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.3
>Reporter: Anmol Asrani
>Assignee: Anmol Asrani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Add metrics related to a particular job, specific to number of total 
> requests, retried requests, retry count and others



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19184) TestStagingCommitter.testJobCommitFailure failing

2024-05-22 Thread Mukund Thakur (Jira)
Mukund Thakur created HADOOP-19184:
--

 Summary: TestStagingCommitter.testJobCommitFailure failing 
 Key: HADOOP-19184
 URL: https://issues.apache.org/jira/browse/HADOOP-19184
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Mukund Thakur
Assignee: Mukund Thakur


[INFO] [ERROR] Failures: [ERROR] TestStagingCommitter.testJobCommitFailure:662 
[Committed objects compared to deleted paths 
org.apache.hadoop.fs.s3a.commit.staging.StagingTestBase$ClientResults@2de1acf4\{
 requests=12, uploads=12, parts=12, tagsByUpload=12, commits=5, aborts=7, 
deletes=0}] Expecting: 
<["s3a://bucket-name/output/path/r_0_0_c055250c-58c7-47ea-8b14-215cb5462e89", 
"s3a://bucket-name/output/path/r_1_1_9111aa65-96c2-465c-b278-696aff7707e3", 
"s3a://bucket-name/output/path/r_0_0_dec7f398-ee4e-4a53-a783-6b72cead569a", 
"s3a://bucket-name/output/path/r_1_1_39ad0eba-1053-4217-aa63-ddc8edfa7c64", 
"s3a://bucket-name/output/path/r_0_0_6c0518f6-7c1b-418f-a3e4-7db568880e6a"]> to 
contain exactly in any order: <[]> but the following elements were unexpected: 
<["s3a://bucket-name/output/path/r_0_0_c055250c-58c7-47ea-8b14-215cb5462e89", 
"s3a://bucket-name/output/path/r_1_1_9111aa65-96c2-465c-b278-696aff7707e3", 
"s3a://bucket-name/output/path/r_0_0_dec7f398-ee4e-4a53-a783-6b72cead569a", 
"s3a://bucket-name/output/path/r_1_1_39ad0eba-1053-4217-aa63-ddc8edfa7c64", 
"s3a://bucket-name/output/path/r_0_0_6c0518f6-7c1b-418f-a3e4-7db568880e6a"]>



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19183) RBF: Support leader follower mode for multiple subclusters

2024-05-22 Thread Yuanbo Liu (Jira)
Yuanbo Liu created HADOOP-19183:
---

 Summary: RBF: Support leader follower mode for multiple subclusters
 Key: HADOOP-19183
 URL: https://issues.apache.org/jira/browse/HADOOP-19183
 Project: Hadoop Common
  Issue Type: Improvement
  Components: RBF
Reporter: Yuanbo Liu


Currently there are five modes in multiple subclusters like
HASH, LOCAL, RANDOM, HASH_ALL,SPACE;

Proposal a new mode called leader/follower mode. routers try to write to leader 
subcluster as many as possible. When routers read data, put leader subcluster 
into first rank.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19182) Upgrade kafka to 3.4.0

2024-05-22 Thread fuchaohong (Jira)
fuchaohong created HADOOP-19182:
---

 Summary: Upgrade kafka to 3.4.0
 Key: HADOOP-19182
 URL: https://issues.apache.org/jira/browse/HADOOP-19182
 Project: Hadoop Common
  Issue Type: Bug
  Components: build
Reporter: fuchaohong






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19163) Upgrade protobuf version to 3.25.3

2024-05-21 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19163.
-
Resolution: Fixed

done. not sure what version to tag with.

Proposed: we cut a new release of this

> Upgrade protobuf version to 3.25.3
> --
>
> Key: HADOOP-19163
> URL: https://issues.apache.org/jira/browse/HADOOP-19163
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: hadoop-thirdparty
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-13147) Constructors must not call overrideable methods in PureJavaCrc32C

2024-05-20 Thread Ayush Saxena (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ayush Saxena resolved HADOOP-13147.
---
Fix Version/s: 3.5.0
 Hadoop Flags: Reviewed
   Resolution: Fixed

> Constructors must not call overrideable methods in PureJavaCrc32C
> -
>
> Key: HADOOP-13147
> URL: https://issues.apache.org/jira/browse/HADOOP-13147
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.0.6-alpha
> Environment: 
> http://svn.apache.org/repos/asf/hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/util/PureJavaCrc32C.java
>Reporter: Sebb
>Assignee: Sebb
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Constructors must not call overrideable methods.
> An object is not guaranteed fully constructed until the constructor exits, so 
> the subclass override may not see the fully created parent object.
> This applies to:
> PureJavaCrc32



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19181) IAMCredentialsProvider throttle failures

2024-05-20 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19181:
---

 Summary: IAMCredentialsProvider throttle failures
 Key: HADOOP-19181
 URL: https://issues.apache.org/jira/browse/HADOOP-19181
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


Tests report throttling errors in IAM being remapped to noauth and failure

Again, impala tests, but with multiple processes on same host. this means that 
HADOOP-18945 isn't sufficient as even if it ensures a singleton instance for a 
process
* it doesn't if there are many test buckets (fixable)
* it doesn't work across processes (not fixable)

we may be able to 
* use a singleton across all filesystem instances
* once we know how throttling is reported, handle it through retries + 
error/stats collection


{code}
2024-02-17T18:02:10,175  WARN [TThreadPoolServer WorkerProcess-22] 
fs.FileSystem: Failed to initialize fileystem 
s3a://impala-test-uswest2-1/test-warehouse/test_num_values_def_levels_mismatch_15b31ddb.db/too_many_def_levels:
 java.nio.file.AccessDeniedException: impala-test-uswest2-1: 
org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials 
provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider 
EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : 
software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
credentials from system settings. Access key must be specified either via 
environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
2024-02-17T18:02:10,175 ERROR [TThreadPoolServer WorkerProcess-22] 
utils.MetaStoreUtils: Got exception: java.nio.file.AccessDeniedException 
impala-test-uswest2-1: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No 
AWS Credentials provided by TemporaryAWSCredentialsProvider 
SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider 
IAMInstanceCredentialsProvider : 
software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
credentials from system settings. Access key must be specified either via 
environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
java.nio.file.AccessDeniedException: impala-test-uswest2-1: 
org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials 
provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider 
EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : 
software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
credentials from system settings. Access key must be specified either via 
environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
at 
org.apache.hadoop.fs.s3a.AWSCredentialProviderList.maybeTranslateCredentialException(AWSCredentialProviderList.java:351)
 ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:201) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:124) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:376) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:372) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:347) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$verifyBucketExists$2(S3AFileSystem.java:972)
 ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:543)
 ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:524)
 ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:445)
 ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2748)
 ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:970)
 ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.doBucketProbing(S3AFileSystem.java:859) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:715) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3452) 
~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.FileSystem.access

[jira] [Resolved] (HADOOP-19167) Change of Codec configuration does not work

2024-05-16 Thread ZanderXu (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19167?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ZanderXu resolved HADOOP-19167.
---
Fix Version/s: 3.5.0
   Resolution: Fixed

> Change of Codec configuration does not work
> ---
>
> Key: HADOOP-19167
> URL: https://issues.apache.org/jira/browse/HADOOP-19167
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: compress
>Reporter: Zhikai Hu
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> In one of my projects, I need to dynamically adjust compression level for 
> different files. 
> However, I found that in most cases the new compression level does not take 
> effect as expected, the old compression level continues to be used.
> Here is the relevant code snippet:
> ZStandardCodec zStandardCodec = new ZStandardCodec();
> zStandardCodec.setConf(conf);
> conf.set("io.compression.codec.zstd.level", "5"); // level may change 
> dynamically
> conf.set("io.compression.codec.zstd", zStandardCodec.getClass().getName());
> writer = SequenceFile.createWriter(conf, 
> SequenceFile.Writer.file(sequenceFilePath),
>                                 
> SequenceFile.Writer.keyClass(LongWritable.class),
>                                 
> SequenceFile.Writer.valueClass(BytesWritable.class),
>                                 
> SequenceFile.Writer.compression(CompressionType.BLOCK));
> The reason is SequenceFile.Writer.init() method will call 
> CodecPool.getCompressor(codec, null) to get a compressor. 
> If the compressor is a reused instance, the conf is not applied because it is 
> passed as null:
> public static Compressor getCompressor(CompressionCodec codec, Configuration 
> conf) {
> Compressor compressor = borrow(compressorPool, codec.getCompressorType());
> if (compressor == null)
> { compressor = codec.createCompressor(); LOG.info("Got brand-new compressor 
> ["+codec.getDefaultExtension()+"]"); }
> else {
> compressor.reinit(conf);   //conf is null here
> ..
>  
> Please also refer to my unit test to reproduce the bug. 
> To address this bug, I modified the code to ensure that the configuration is 
> read back from the codec when a compressor is reused.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   10   >