[jira] [Resolved] (HADOOP-19286) Support S3A cross region access when S3 region/endpoint is set

2024-10-04 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19286.
-
Resolution: Fixed

> Support S3A cross region access when S3 region/endpoint is set
> --
>
> Key: HADOOP-19286
> URL: https://issues.apache.org/jira/browse/HADOOP-19286
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Currently when S3 region nor endpoint is set, the default region is set to 
> us-east-2 with cross region access enabled. But when region or endpoint is 
> set, cross region access is not enabled.
> The proposal here is to carves out cross region access as a separate config 
> and enable/disable it irrespective of region/endpoint is set. This gives more 
> flexibility to the user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-19286) Support S3A cross region access when S3 region/endpoint is set

2024-10-03 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reopened HADOOP-19286:
-

new tests fails for me;  i have a bucket specific region

{code}
[ERROR] 
testWithCrossRegionAccess(org.apache.hadoop.fs.s3a.ITestS3AEndpointRegion)  
Time elapsed: 1.199 s  <<< ERROR!
org.apache.hadoop.fs.s3a.AWSBadRequestException: getFileStatus on 
s3a://stevel-london/user/stevel: 
software.amazon.awssdk.services.s3.model.S3Exception: null (Service: S3, Status 
Code: 400, Request ID: PANJ0H4G5Z7XDN8K, Extended Request ID: 
0jVye0vK5JIuXLPN2fC3TpqYx/bi5r9Fuk7KdahorhdUJ0IGT/ca392MCjYABvq7IfLMwG/P+7Y=):null:
 null (Service: S3, Status Code: 400, Request ID: PANJ0H4G5Z7XDN8K, Extended 
Request ID: 
0jVye0vK5JIuXLPN2fC3TpqYx/bi5r9Fuk7KdahorhdUJ0IGT/ca392MCjYABvq7IfLMwG/P+7Y=)
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:262)
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:157)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:4099)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:4005)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$exists$33(S3AFileSystem.java:5007)
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:449)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2863)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2882)
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.exists(S3AFileSystem.java:5005)
at 
org.apache.hadoop.fs.s3a.ITestS3AEndpointRegion.testWithCrossRegionAccess(ITestS3AEndpointRegion.java:384)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at 
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at 
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at 
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at 
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at 
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at 
org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
at 
org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:750)
Caused by: software.amazon.awssdk.services.s3.model.S3Exception: null (Service: 
S3, Status Code: 400, Request ID: PANJ0H4G5Z7XDN8K, Extended Request ID: 
0jVye0vK5JIuXLPN2fC3TpqYx/bi5r9Fuk7KdahorhdUJ0IGT/ca392MCjYABvq7IfLMwG/P+7Y=)
at 
software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156)
at 
software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108)
at 
software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:85)
at 
software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:43)
at 
software.amazon.awssdk.awscore.client.handler.AwsSyncClientHandler$Crc32ValidationResponseHandler.handle(AwsSyncClientHandler.java:93)
at 
software.amazon.awssdk.core.internal.handler.BaseClientHandler.lambda$successTransformationResponseHandler$7(BaseClientHandler.java:279)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:50)
at 
software.amazon.awssdk.core.internal.http.pipeline.stages.HandleResponseStage.execute(HandleResponseStage.java:38)
at 
software.amazon.awssdk.core.internal.http.pipeline.RequestPipelineBuilder$ComposingRequestPipelineStage.execute(RequestPipelineBuilder.java:206)
at 
software.amazon.awssdk.core.internal

[jira] [Created] (HADOOP-19299) ConcurrentModificationException in HttpReferrerAuditHeader

2024-10-02 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19299:
---

 Summary: ConcurrentModificationException in HttpReferrerAuditHeader
 Key: HADOOP-19299
 URL: https://issues.apache.org/jira/browse/HADOOP-19299
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.4.1
Reporter: Steve Loughran


Surfaced during a test run doing vector iO, where multiple parallel GETs were 
being issued within the same audit span, just when the header is built by 
enumerating the attributes.

{code}
  queries = attributes.entrySet().stream()
  .filter(e -> !filter.contains(e.getKey()))
  .map(e -> e.getKey() + "=" + e.getValue())
  .collect(Collectors.joining("&"));
{code}

Hypothesis: multiple GET requests are conflicting in updating/reading the 
header.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19293) Avoid Subject.getSubject method on newer JVMs

2024-10-02 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19293.
-
Resolution: Duplicate

see discussion on HADOOP-19212

> Avoid Subject.getSubject method on newer JVMs
> -
>
> Key: HADOOP-19293
> URL: https://issues.apache.org/jira/browse/HADOOP-19293
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: auth, common
>Reporter: Justin
>Assignee: Justin
>Priority: Major
>
> In Java 23, Subject.getSubject requires setting the system property 
> java.security.manager to allow, else it will throw an exception. More detail 
> is available in the release notes: https://jdk.java.net/23/release-notes
> This is in support of the eventual removal of the security manager, at which 
> point, Subject.getSubject will be removed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19286) Support S3A cross region access when S3 region/endpoint is set

2024-10-01 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19286?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19286.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> Support S3A cross region access when S3 region/endpoint is set
> --
>
> Key: HADOOP-19286
> URL: https://issues.apache.org/jira/browse/HADOOP-19286
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Syed Shameerur Rahman
>Assignee: Syed Shameerur Rahman
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Currently when S3 region nor endpoint is set, the default region is set to 
> us-east-2 with cross region access enabled. But when region or endpoint is 
> set, cross region access is not enabled.
> The proposal here is to carves out cross region access as a separate config 
> and enable/disable it irrespective of region/endpoint is set. This gives more 
> flexibility to the user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19288) hadoop-client-runtime exclude dnsjava InetAddressResolverProvider

2024-10-01 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19288.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> hadoop-client-runtime exclude dnsjava InetAddressResolverProvider
> -
>
> Key: HADOOP-19288
> URL: https://issues.apache.org/jira/browse/HADOOP-19288
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> [https://github.com/dnsjava/dnsjava/issues/338]
>  
> {code:java}
> Exception in thread "main" java.util.ServiceConfigurationError: 
> java.net.spi.InetAddressResolverProvider: Provider 
> org.apache.hadoop.shaded.org.xbill.DNS.spi.DnsjavaInetAddressResolverProvider 
> not found
>     at java.base/java.util.ServiceLoader.fail(ServiceLoader.java:593)
>     at 
> java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.nextProviderClass(ServiceLoader.java:1219)
>     at 
> java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNextService(ServiceLoader.java:1228)
>     at 
> java.base/java.util.ServiceLoader$LazyClassPathLookupIterator.hasNext(ServiceLoader.java:1273)
>     at java.base/java.util.ServiceLoader$2.hasNext(ServiceLoader.java:1309)
>     at java.base/java.util.ServiceLoader$3.hasNext(ServiceLoader.java:1393)
>     at java.base/java.util.ServiceLoader.findFirst(ServiceLoader.java:1812)
>     at java.base/java.net.InetAddress.loadResolver(InetAddress.java:508)
>     at java.base/java.net.InetAddress.resolver(InetAddress.java:488)
>     at 
> java.base/java.net.InetAddress.getAddressesFromNameService(InetAddress.java:1826)
>     at 
> java.base/java.net.InetAddress$NameServiceAddresses.get(InetAddress.java:1139)
>     at java.base/java.net.InetAddress.getAllByName0(InetAddress.java:1818)
>     at java.base/java.net.InetAddress.getLocalHost(InetAddress.java:1931)
>     at 
> org.apache.logging.log4j.core.util.NetUtils.getLocalHostname(NetUtils.java:56)
>     at 
> org.apache.logging.log4j.core.LoggerContext.lambda$setConfiguration$0(LoggerContext.java:625)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19294) NPE on maven enforcer with -Pnative on arm mac

2024-10-01 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19294.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> NPE on maven enforcer with -Pnative on arm mac
> --
>
> Key: HADOOP-19294
> URL: https://issues.apache.org/jira/browse/HADOOP-19294
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Affects Versions: 3.5.0, 3.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Fix For: 3.5.0, 3.4.1
>
>
> If you try to build on an arm mac with -Pnative you get an npe in enforcer.
> This is independent of whether or not maven can actually compare the native 
> code.
> Upgrading maven to 3.5.0 works.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19295) S3A: fs.s3a.connection.request.timeout too low for large uploads over slow links

2024-09-30 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19295:
---

 Summary: S3A: fs.s3a.connection.request.timeout too low for large 
uploads over slow links
 Key: HADOOP-19295
 URL: https://issues.apache.org/jira/browse/HADOOP-19295
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0, 3.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran


The value of {{fs.s3a.connection.request.timeout}} (default = 60s} is too low 
for large uploads over slow connections.

I suspect something changed between the v1 and v2 SDK versions so that put was 
exempt from the normal timeouts, It is not and now surfaces in failures to 
upload 1+ GB files over slower network connections. Smailer (for example 128 
MB) files work.

The parallel queuing of writes in the S3ABlockOutputStream is helping create 
this problem as it queues multiple blocks at the same time, so per-block 
bandwidth becomes available/blocks ; four blocks cuts the capacity down by a 
quarter.

The fix is straightforward: use a much bigger timeout. I'm going to propose 15 
minutes. We need to strike a balance between upload time allocation and other 
requests timing out.

I do worry about other consequences; we've found that timeout exception happy 
to hide the underlying causes of retry failures -so in fact this may be better 
for all but a server hanging after the HTTP request is initiated.

too bad we can't alter the timeout for different requests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19294) NPE on maven enforcer with -Pnative on arm mac

2024-09-30 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19294:
---

 Summary: NPE on maven enforcer with -Pnative on arm mac
 Key: HADOOP-19294
 URL: https://issues.apache.org/jira/browse/HADOOP-19294
 Project: Hadoop Common
  Issue Type: Bug
  Components: build
Affects Versions: 3.5.0, 3.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran


If you try to build on an arm mac with -Pnative you get an npe in enforcer.

This is independent of whether or not maven can actually compare the native 
code.

Upgrading maven to 3.5.0 works.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19280) ABFS: Initialize ABFS client timer only when metric collection is enabled

2024-09-30 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19280.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> ABFS: Initialize ABFS client timer only when metric collection is enabled
> -
>
> Key: HADOOP-19280
> URL: https://issues.apache.org/jira/browse/HADOOP-19280
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.4.0, 3.4.1
>Reporter: Manish Bhatt
>Assignee: Manish Bhatt
>Priority: Major
> Fix For: 3.5.0
>
>
> In the current flow, we are initializing the timer of the 
> {{abfs-timer-client}} outside the metric collection enable check. As a 
> result, for each file system, when the {{AbfsClient}} object is initialized, 
> it spawns a thread to evaluate the time of the ABFS client. Since we are 
> purging/closing the timer inside the metric collection check, these threads 
> are not being closed, causing them to persist in a long-lived state. To fix 
> this, we are moving the timer initialization inside the condition



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19281) MetricsSystemImpl should not print INFO message in CLI

2024-09-27 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19281.
-
Resolution: Fixed

> MetricsSystemImpl should not print INFO message in CLI
> --
>
> Key: HADOOP-19281
> URL: https://issues.apache.org/jira/browse/HADOOP-19281
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Tsz-wo Sze
>Assignee: Sarveksha Yeshavantha Raju
>Priority: Major
>  Labels: newbie, pull-request-available
> Fix For: 3.5.0, 3.4.2
>
> Attachments: 7071_review.patch
>
>
> Below is an example:
> {code}
> # hadoop fs  -Dfs.s3a.bucket.probe=0 
> -Dfs.s3a.change.detection.version.required=false 
> -Dfs.s3a.change.detection.mode=none -Dfs.s3a.endpoint=http://some.site:9878 
> -Dfs.s3a.access.keysome=systest -Dfs.s3a.secret.key=8...1 
> -Dfs.s3a.endpoint=http://some.site:9878  -Dfs.s3a.path.style.access=true 
> -Dfs.s3a.impl=org.apache.hadoop.fs.s3a.S3AFileSystem   -ls  -R s3a://bucket1/
> 24/09/17 10:47:48 WARN impl.MetricsConfig: Cannot locate configuration: tried 
> hadoop-metrics2-s3a-file-system.properties,hadoop-metrics2.properties
> 24/09/17 10:47:48 INFO impl.MetricsSystemImpl: Scheduled Metric snapshot 
> period at 10 second(s).
> 24/09/17 10:47:48 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> started
> 24/09/17 10:47:48 WARN impl.ConfigurationHelper: Option 
> fs.s3a.connection.establish.timeout is too low (5,000 ms). Setting to 15,000 
> ms instead
> 24/09/17 10:47:50 WARN s3.S3TransferManager: The provided S3AsyncClient is an 
> instance of MultipartS3AsyncClient, and thus multipart download feature is 
> not enabled. To benefit from all features, consider using 
> S3AsyncClient.crtBuilder().build() instead.
> drwxrwxrwx   - root root  0 2024-09-17 10:47 s3a://bucket1/dir1
> 24/09/17 10:47:53 INFO impl.MetricsSystemImpl: Stopping s3a-file-system 
> metrics system...
> 24/09/17 10:47:53 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> stopped.
> 24/09/17 10:47:53 INFO impl.MetricsSystemImpl: s3a-file-system metrics system 
> shutdown complete. 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19285) Restore ETAGS_AVAILABLE to abfs patch capabilitiees

2024-09-23 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19285:
---

 Summary: Restore ETAGS_AVAILABLE to abfs patch capabilitiees
 Key: HADOOP-19285
 URL: https://issues.apache.org/jira/browse/HADOOP-19285
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/azure
Affects Versions: 3.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran


HADOOP-19131 accidentally deleted  {{CommonPathCapabilities.ETAGS_AVAILABLE}} 
from the patch capabilities of abfs. 

restore



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19272) S3A: AWS SDK 2.25.53 warnings logged about transfer manager not using CRT client

2024-09-19 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19272.
-
Resolution: Fixed

> S3A: AWS SDK 2.25.53 warnings logged about transfer manager not using CRT 
> client
> 
>
> Key: HADOOP-19272
> URL: https://issues.apache.org/jira/browse/HADOOP-19272
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.2
>
> Attachments: output.txt
>
>
> When an S3 transfer manager is created for renaming/download a new message is 
> logged telling off the caller for not using the CRT client.
> {code}
> 5645:2024-09-13 16:29:17,375 [setup] WARN  s3.S3TransferManager 
> (LoggerAdapter.java:warn(225)) - The provided S3AsyncClient is an instance of 
> MultipartS3AsyncClient, and thus multipart download feature is not enabled. 
> To benefit from all features, consider using 
> S3AsyncClient.crtBuilder().build() instead.
> {code}
> This is a change in the SDK to tell us developers off -yet it is visible to 
> end users who don't benefit from it and for which it only creates confusion.
> It appears to have been downgraded to debug in the AWS trunk code in PR "S3 
> Async Client - Multipart download (#5164) -but:
> * it is too late to upgrade and qualify a new version for 3.4.1; downgrading 
> is all we can do
> * there is no guarantee this log message or similar will reoccur.
> Plan
> 1. Revert from 3.4.1
> 2. lift code from cloudstore library which uses reflection to access and 
> manipulate log4j logs where present
> 3. downgrade all transfer manager log levels to NONE. 
> 4. File an AWS report about how this is an incompatible regression, identify 
> how their process can evolve, particularly in the area of code guidelines 
> about safe logging use.
> I also intend to tighten up our review process to support more rigorous 
> detection of new .warn() messages in the AWS SDK. I'm going to propose that 
> as well as requiring review of our test/CLI output, we require ripgrep scans 
> of .warn(/.error( in SDK source, audit of any new changes. by saving the 
> output of the previous iteration, it'll be straightforward to identify new 
> changes -but not changes in codepaths which change their frequency of 
> appearance.
> I think we should revisit whether or not to move off the xfer manager in the 
> past. We've discussed it in the past, and avoided it just due to maintenance 
> costs. However, it is pushing maintenance costs anyway.
> meanwhile: no new AWS SDK updates until we are confident we have our 
> processes under control.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19278) S3A: remove option to deletel directory markers

2024-09-17 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19278:
---

 Summary: S3A: remove option to deletel directory markers
 Key: HADOOP-19278
 URL: https://issues.apache.org/jira/browse/HADOOP-19278
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Affects Versions: 3.4.1
Reporter: Steve Loughran


We've supported directory marker retention since HADOOP-13230 went in.

and switch to making it the default in HADOOP-18752

nobody has ever complained about this.

proposed: cut directory marker deletion entirely.
this will
* simplify our code
* cut down on test options



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19221) S3A: Unable to recover from failure of multipart block upload attempt "Status Code: 400; Error Code: RequestTimeout"

2024-09-16 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19221.
-
Fix Version/s: 3.4.2
   Resolution: Fixed

> S3A: Unable to recover from failure of multipart block upload attempt "Status 
> Code: 400; Error Code: RequestTimeout"
> 
>
> Key: HADOOP-19221
> URL: https://issues.apache.org/jira/browse/HADOOP-19221
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.2
>
>
> If a multipart PUT request fails for some reason (e.g. networrk error) then 
> all subsequent retry attempts fail with a 400 Response and ErrorCode 
> RequestTimeout .
> {code}
> Your socket connection to the server was not read from or written to within 
> the timeout period. Idle connections will be closed. (Service: Amazon S3; 
> Status Code: 400; Error Code: RequestTimeout; Request ID:; S3 Extended 
> Request ID:
> {code}
> The list of supporessed exceptions contains the root cause (the initial 
> failure was a 500); all retries failed to upload properly from the source 
> input stream {{RequestBody.fromInputStream(fileStream, size)}}.
> Hypothesis: the mark/reset stuff doesn't work for input streams. On the v1 
> sdk we would build a multipart block upload request passing in (file, offset, 
> length), the way we are now doing this doesn't recover.
> probably fixable by providing our own {{ContentStreamProvider}} 
> implementations for
> # file + offset + length
> # bytebuffer
> # byte array
> The sdk does have explicit support for the memory ones, but they copy the 
> data blocks first. we don't want that as it would double the memory 
> requirements of active blocks.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19274) S3A: new test failures on branch-3.4.1

2024-09-13 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19274:
---

 Summary: S3A: new test failures on branch-3.4.1
 Key: HADOOP-19274
 URL: https://issues.apache.org/jira/browse/HADOOP-19274
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3, test
Affects Versions: 3.4.1
Reporter: Steve Loughran


test failures on a -Dscale run; s3 usw-1, caller has IAM role.
{code}
[INFO] Running org.apache.hadoop.fs.s3a.ITestS3AClosedFS
[ERROR] Tests run: 8, Failures: 1, Errors: 0, Skipped: 0, Time elapsed: 1.497 s 
<<< FAILURE! - in org.apache.hadoop.fs.s3a.ITestS3AClosedFS
[ERROR] testClosedInstrumentation(org.apache.hadoop.fs.s3a.ITestS3AClosedFS)  
Time elapsed: 0.143 s  <<< FAILURE!
org.junit.ComparisonFailure: [S3AInstrumentation.hasMetricSystem()] 
expected:<[fals]e> but was:<[tru]e>
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at 
sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at 
org.apache.hadoop.fs.s3a.ITestS3AClosedFS.testClosedInstrumentation(ITestS3AClosedFS.java:111)
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19273) S3A: Commiter ITests failing unless job.id set

2024-09-13 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19273:
---

 Summary: S3A: Commiter ITests failing unless job.id set
 Key: HADOOP-19273
 URL: https://issues.apache.org/jira/browse/HADOOP-19273
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3, test
Affects Versions: 3.5.0, 3.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran


the job.id maven setup is broken on parallel runs: unless explicitly set on the 
command line tests which try to parse its traling bits as a number will fail.

{code}
[ERROR] testBinding[File committer in task-fs=[]-task=[file]-[class 
org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter]](org.apache.hadoop.fs.s3a.commit.ITestS3ACommitterFactory)
  Time elapsed: 0.181 s  <<< ERROR!
java.lang.Exception: Failed to parse b-00
{code}

Emergency workaround: set it
{code}
-Djob.id=0001
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19272) S3A: warnings logged about transfer manager not using Crt client

2024-09-13 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19272:
---

 Summary: S3A: warnings logged about transfer manager not using Crt 
client
 Key: HADOOP-19272
 URL: https://issues.apache.org/jira/browse/HADOOP-19272
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


When an S3 transfer manager is created for renaming/download a new message is 
logged telling off the caller for not using the CRT client.

{code}
5645:2024-09-13 16:29:17,375 [setup] WARN  s3.S3TransferManager 
(LoggerAdapter.java:warn(225)) - The provided S3AsyncClient is an instance of 
MultipartS3AsyncClient, and thus multipart download feature is not enabled. To 
benefit from all features, consider using S3AsyncClient.crtBuilder().build() 
instead.
{code}

This is
* a change in the SDK to tell us off
* downgraded to debug in the aws trunk code "S3 Async Client - Multipart 
download (#5164)"







--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19189) ITestS3ACommitterFactory failing

2024-09-10 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19189?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19189.
-
Resolution: Fixed

thanks. I'd probably kept it open for the backport and forgot to close. 
verified which branches have the fix: it's 3.4.1+

> ITestS3ACommitterFactory failing
> 
>
> Key: HADOOP-19189
> URL: https://issues.apache.org/jira/browse/HADOOP-19189
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> we've had ITestS3ACommitterFactory failing for a while, where it looks like 
> changed committer settings aren't being picked up.
> {code}
> ERROR] 
> ITestS3ACommitterFactory.testEverything:115->testInvalidFileBinding:165 
> Expected a org.apache.hadoop.fs.s3a.commit.PathCommitException to be thrown, 
> but got the result: : 
> FileOutputCommitter{PathOutputCommitter{context=TaskAttemptContextImpl{JobContextImpl
> {code}
> I've spent some time looking at it and it is happening because the test sets 
> the fileystem ref for the local test fs, and not that of the filesystem 
> created by the committer, which is where the option is picked up.
> i've tried to parameterize it but things are still playing up and I'm not 
> sure how hard to try to fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19201) S3A: Support external id in assume role

2024-09-10 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19201.
-
Fix Version/s: 3.5.0
   3.4.2
   Resolution: Fixed

> S3A: Support external id in assume role
> ---
>
> Key: HADOOP-19201
> URL: https://issues.apache.org/jira/browse/HADOOP-19201
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Smith Cruise
>Assignee: Smith Cruise
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.2
>
>
> Extend IAM role suport with external IDs which can be set in 
> fs.s3a.assumed.role.external.id
> Support external id in AssumedRoleCredentialProvider.
>  
> https://docs.aws.amazon.com/IAM/latest/UserGuide/id_roles_create_for-user_externalid.html



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19252) Release Hadoop Third-Party 1.3.0

2024-09-05 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19252.
-
Fix Version/s: 3.4.1
   thirdparty-1.3.0
   Resolution: Fixed

> Release Hadoop Third-Party 1.3.0
> 
>
> Key: HADOOP-19252
> URL: https://issues.apache.org/jira/browse/HADOOP-19252
> Project: Hadoop Common
>  Issue Type: Task
>  Components: hadoop-thirdparty
>Affects Versions: thirdparty-1.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.1, thirdparty-1.3.0
>
>
> Create a release of thirdparty jar with the protobuf version compatible with 
> all java8 builds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19269) Upgrade maven-shade-plugin to 3.6.0

2024-09-05 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19269.
-
Fix Version/s: 3.5.0
 Assignee: PJ Fanning
   Resolution: Fixed

> Upgrade maven-shade-plugin to 3.6.0
> ---
>
> Key: HADOOP-19269
> URL: https://issues.apache.org/jira/browse/HADOOP-19269
> Project: Hadoop Common
>  Issue Type: Task
>Reporter: PJ Fanning
>Assignee: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> I found an issue when testing with Jackson 2.18.0-rc1.
> Hadoop bundles the Jackson and other 3rd party classes into Hadoop fat jars 
> using maven-shade-plugin. 
> Jackson is a Multi-Release jar. https://openjdk.org/jeps/238
> The most recent jackson-core jars have classes that support 
> META-INF/versions/21 classes. The existing maven-shade-plugin version in 
> Hadoop build fails with ASM issues with Java 21 classes. maven-shade-plugin 
> handles them fine.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-16928) [JDK13] Support HTML5 Javadoc

2024-09-04 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-16928.
-
Fix Version/s: 3.5.0
 Assignee: Cheng Pan
   Resolution: Fixed

> [JDK13] Support HTML5 Javadoc
> -
>
> Key: HADOOP-16928
> URL: https://issues.apache.org/jira/browse/HADOOP-16928
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Akira Ajisaka
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> javadoc -html4 option has been removed since Java 13.
> https://bugs.openjdk.java.net/browse/JDK-8215578



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19257) S3A: ITestAssumeRole.testAssumeRoleBadInnerAuth failure

2024-09-03 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19257.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> S3A: ITestAssumeRole.testAssumeRoleBadInnerAuth failure
> ---
>
> Key: HADOOP-19257
> URL: https://issues.apache.org/jira/browse/HADOOP-19257
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.4.0, 3.4.1
>Reporter: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> Not sure when this changed, but I've only just noticed today while setting up 
> a new test config.
> The test {{testAssumeRoleBadInnerAuth}} is failing because the error string 
> coming back from STS is slightly different.
> {code}
> [ERROR] 
> testAssumeRoleBadInnerAuth(org.apache.hadoop.fs.s3a.auth.ITestAssumeRole)  
> Time elapsed: 4.182 s  <<< FAILURE!
> java.lang.AssertionError: 
>  Expected to find 'not a valid key=value pair (missing equal-sign) in 
> Authorization header' but got unexpected exception: 
> org.apache.hadoop.fs.s3a.AWSBadRequestException: Instantiate 
> org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider on /: 
> software.amazon.awssdk.services.sts.model.StsException: Invalid key=value 
> pair (missing equal-sign) 
> {code}
> Rather than change the string to look for, lets just remove the string so it 
> it less brittle in future



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19262) wildfly-openssl:1.1.3.Final seems to be incompatible with jdk17

2024-09-03 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19262:
---

 Summary: wildfly-openssl:1.1.3.Final seems to be incompatible with 
jdk17
 Key: HADOOP-19262
 URL: https://issues.apache.org/jira/browse/HADOOP-19262
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure, fs/s3
Affects Versions: 3.5.0
Reporter: Steve Loughran


Apparentlly wildfly has to be updated to  2.1.4.Final to work on java17+; 
without that s3 and azure need to be configured to not use openssl to connect 
to stores



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19260) removal of gcm TLS cyphers blocking abfs access "No negotiable cipher suite"

2024-09-02 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19260:
---

 Summary: removal of gcm TLS cyphers blocking abfs access "No 
negotiable cipher suite"
 Key: HADOOP-19260
 URL: https://issues.apache.org/jira/browse/HADOOP-19260
 Project: Hadoop Common
  Issue Type: Bug
  Components: common, fs/azure
Affects Versions: 3.4.0
Reporter: Steve Loughran


we've seen instances of client-abfs TLS negotiation failing "No negotiable 
cipher suite". this can be fixed by switching to using "Default_JSSE_with_GCM" 
as the SSL options.

However, DelegatingSSLSocketFactory "Default" attempts OpenSSL, falling back to 
{code}
Default indicates Ordered, preferred OpenSSL, if failed to load then fall
 back to Default_JSSE
{code}

And " Default_JSSE is not truly the the default JSSE implementation because
the GCM cipher is disabled when running on Java "

What does that mean? it means that if you use the "Default" TLS option of "try 
openssl and fall back to java" doesn't ever turn on gcm encryption.

Proposed:
* "Default" falls back to GCM
* add an option {{Default_JSSE_No_GCM}}

Once we move off java8 turning off GCM is no longer needed for performance, 
hopefully (benchmarks would be good here)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19250) Fix test TestServiceInterruptHandling.testRegisterAndRaise

2024-08-30 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19250.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> Fix test TestServiceInterruptHandling.testRegisterAndRaise
> --
>
> Key: HADOOP-19250
> URL: https://issues.apache.org/jira/browse/HADOOP-19250
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: test
>Reporter: Chenyu Zheng
>Assignee: Chenyu Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> If test on some slow server, testRegisterAndRaise may fail, error stack are
> ```
> [ERROR] 
> testRegisterAndRaise(org.apache.hadoop.service.launcher.TestServiceInterruptHandling)
>  Time elapsed: 0.513 s <<< FAILURE! java.lang.AssertionError: interrupt data 
> at org.junit.Assert.fail(Assert.java:89) at 
> org.junit.Assert.assertTrue(Assert.java:42) at 
> org.junit.Assert.assertNotNull(Assert.java:713) at 
> org.apache.hadoop.service.launcher.TestServiceInterruptHandling.testRegisterAndRaise(TestServiceInterruptHandling.java:48)
>  at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
>  at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>  at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
>  at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>  at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26) 
> at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27) 
> at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61) at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
>  at 
> org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
>  at java.util.concurrent.FutureTask.run(FutureTask.java:266) at 
> java.lang.Thread.run(Thread.java:750)
> ```
> The error link is 
> https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-6813/7/artifact/out/patch-unit-hadoop-common-project_hadoop-common.txt



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19248) Protobuf code generate and replace should happen together

2024-08-30 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19248.
-
Resolution: Fixed

> Protobuf code generate and replace should happen together
> -
>
> Key: HADOOP-19248
> URL: https://issues.apache.org/jira/browse/HADOOP-19248
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19257) S3A: ITestAssumeRole.testAssumeRoleBadInnerAuth failure

2024-08-29 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19257:
---

 Summary: S3A: ITestAssumeRole.testAssumeRoleBadInnerAuth failure
 Key: HADOOP-19257
 URL: https://issues.apache.org/jira/browse/HADOOP-19257
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3, test
Affects Versions: 3.4.0, 3.4.1
Reporter: Steve Loughran


Not sure when this changed, but I've only just noticed today while setting up a 
new test config.

The test {{testAssumeRoleBadInnerAuth}} is failing because the error string 
coming back from STS is slightly different.

{code}
[ERROR] 
testAssumeRoleBadInnerAuth(org.apache.hadoop.fs.s3a.auth.ITestAssumeRole)  Time 
elapsed: 4.182 s  <<< FAILURE!
java.lang.AssertionError: 
 Expected to find 'not a valid key=value pair (missing equal-sign) in 
Authorization header' but got unexpected exception: 
org.apache.hadoop.fs.s3a.AWSBadRequestException: Instantiate 
org.apache.hadoop.fs.s3a.auth.AssumedRoleCredentialProvider on /: 
software.amazon.awssdk.services.sts.model.StsException: Invalid key=value pair 
(missing equal-sign) 
{code}

Rather than change the string to look for, lets just remove the string so it it 
less brittle in future




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18487) Make protobuf 2.5 an optional runtime dependency.

2024-08-29 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18487.
-
Fix Version/s: 3.4.1
   3.4.0
 Release Note: 
hadoop modules no longer export protobuf-2.5.0 as a dependency, and it is 
omitted from the hadoop distribution directory. Applications which use the 
library must declare an explicit dependency.

Hadoop uses a shaded version of protobuf3 internally, and does not use the 
2.5.0 JAR except when compiling compatible classes. It is still included in the 
binary distributions when the yarn timeline server is built with hbase 1

  was:
hadoop modules no longer export protobuf-2.5.0 as a dependency, and it is 
omitted from the hadoop distribution directory. Applications which use the 
library must declare an explicit dependency.

Hadoop uses a shaded version of protobuf3 internally, and does not use the 
2.5.0 JAR except when compiling compatible classes

   Resolution: Fixed

> Make protobuf 2.5 an optional runtime dependency.
> -
>
> Key: HADOOP-18487
> URL: https://issues.apache.org/jira/browse/HADOOP-18487
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build, ipc
>Affects Versions: 3.3.4
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.4.1, 3.4.0
>
>
> uses of protobuf 2.5 and RpcEnginej have been deprecated since 3.3.0 in 
> HADOOP-17046
> while still keeping those files around (for a long time...), how about we 
> make the protobuf 2.5.0 export off hadoop common and hadoop-hdfs *provided*, 
> rather than *compile*
> that way, if apps want it for their own apis, they have to explicitly ask for 
> it, but at least our own scans don't break.
> i have no idea what will happen to the rest of the stack at this point, it 
> will be "interesting" to see



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19131) WrappedIO to export modern filesystem/statistics APIs in a reflection friendly form

2024-08-21 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19131.
-
Resolution: Fixed

> WrappedIO to export modern filesystem/statistics APIs in a reflection 
> friendly form
> ---
>
> Key: HADOOP-19131
> URL: https://issues.apache.org/jira/browse/HADOOP-19131
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 3.4.0, 3.3.6
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> parquet, avro etc are still stuck building with older hadoop releases. 
> This makes using new APIs hard (PARQUET-2171) and means that APIs which are 5 
> years old such as HADOOP-15229 just aren't picked up.
> This lack of openFIle() adoption hurts working with files in cloud storage as
> * extra HEAD requests are made
> * read policies can't be explicitly set
> * split start/end can't be passed down
> HADOOP-18679 added a new WrappedIO class.
> This jira proposes extending this with
> * more of the filesystem/input stream methods
> * iOStatistics
> * Pull in parquet DynMethods to dynamially wrap and invoke through tests. 
> This class, DynamicWrappedIO is intended to be copied into libraries 
> (parquet, iceberg) for their own use. 
> * existing tests to use the dynamic binding for end-to-end testing.
> +then get into the downstream libraries and use where appropriate



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18542) Azure Token provider requires tenant and client IDs despite being optional

2024-08-21 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18542?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18542.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> Azure Token provider requires tenant and client IDs despite being optional
> --
>
> Key: HADOOP-18542
> URL: https://issues.apache.org/jira/browse/HADOOP-18542
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure, hadoop-thirdparty
>Affects Versions: 3.3.2, 3.3.3, 3.3.4
>Reporter: Carl
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> The `AbfsConfiguration` class requires that we provide a tenant and client ID 
> when using the `MsiTokenProvider` class to fetch an authentication token. The 
> bug is that those fields are not required by the Azure API, which can infer 
> those fields when the call is made from an Azure instance.
> The fix is to make tenant and client ID optional when getting an Azure token 
> from the Azure Metadata Service.
> A fix has been submitted here: [https://github.com/apache/hadoop/pull/4262]
> The bug was introduced with HADOOP-17725  
> ([https://github.com/apache/hadoop/pull/3041/files])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19187) ABFS: [FnsOverBlob]Making AbfsClient Abstract for supporting both DFS and Blob Endpoint

2024-08-20 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19187.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> ABFS: [FnsOverBlob]Making AbfsClient Abstract for supporting both DFS and 
> Blob Endpoint
> ---
>
> Key: HADOOP-19187
> URL: https://issues.apache.org/jira/browse/HADOOP-19187
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Azure Services support two different set of APIs.
> Blob: 
> [https://learn.microsoft.com/en-us/rest/api/storageservices/blob-service-rest-api]
>  
> DFS: 
> [https://learn.microsoft.com/en-us/rest/api/storageservices/datalakestoragegen2/operation-groups]
>  
> As per the plan in HADOOP-19179, this task enables ABFS Driver to work with 
> both set of APIs as per the requirement.
> Scope of this task is to refactor the ABfsClient so that ABFSStore can choose 
> to interact with the client it wants based on the endpoint configured by user.
> The blob endpoint support will remain "Unsupported" until the whole code is 
> checked-in and well tested.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18965) ITestS3AHugeFilesEncryption failure

2024-08-20 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18965.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> ITestS3AHugeFilesEncryption failure
> ---
>
> Key: HADOOP-18965
> URL: https://issues.apache.org/jira/browse/HADOOP-18965
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> test failures for me with a test setup of per-bucket encryption of sse-kms.
> suspect (but can't guarantee) HADOOP-18850 may be a factor.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19249) Getting NullPointerException when the unauthorised user tries to perform the key operation

2024-08-20 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19249.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> Getting NullPointerException when the unauthorised user tries to perform the 
> key operation
> --
>
> Key: HADOOP-19249
> URL: https://issues.apache.org/jira/browse/HADOOP-19249
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common, security
>Reporter: Dhaval Shah
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> While validating the tomcat 9.x in apache Ranger when user doesn't have 
> appropriate permission in Ranger policies we faced the NPE for key operation 
> using hadoop cmd.
> *Problem :*
> _Functionally -_ We are facing the NPE while performing key operations from 
> hadoop cmd with the user not having permission in policy on cluster with 
> tomcat v9.x. However with curl to Ranger KSM Server is working as expected.
> _Technically -_ Getting response message as null on client side in 
> hadoop-common at 
> [KMSClientProvider.java|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/crypto/key/kms/KMSClientProvider.java#L565]
> *E.G.*
> _with Ranger KMS tomcat v9.x_
> {code:java}
>  hadoop key list
> The list subcommand displays the keynames contained within
> a particular provider as configured in core-site.xml or
> specified with the -provider argument. -metadata displays
> the metadata. If -strict is supplied, fail immediately if
> the provider requires a password and none is given.
> Exception in thread "main" java.lang.NullPointerException
>   at 
> org.apache.hadoop.crypto.key.KeyShell.prettifyException(KeyShell.java:541)
>   at 
> org.apache.hadoop.crypto.key.KeyShell.printException(KeyShell.java:536)
>   at org.apache.hadoop.tools.CommandShell.run(CommandShell.java:79)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:81)
>   at org.apache.hadoop.crypto.key.KeyShell.main(KeyShell.java:553) {code}
> _on_ _Ranger KMS_ _tomcat v8.5.x_
> {code:java}
> hadoop key list
> The list subcommand displays the keynames contained within
> a particular provider as configured in core-site.xml or
> specified with the -provider argument. -metadata displays
> the metadata. If -strict is supplied, fail immediately ifthe provider 
> requires a password and none is given.
> Executing command failed with the following exception: 
> AuthorizationException: User:xyzuser not allowed to do 'GET_KEYS'{code}
> *Debug logs on Ranger KMS Server side*
> 1.) Added logs in 
> [KMSExceptionsProvider.java|https://github.com/apache/ranger/blob/master/kms/src/main/java/org/apache/hadoop/crypto/key/kms/server/KMSExceptionsProvider.java]
>  in method _createResponse()_ and _toResponse()_ where we are generating 
> response to send it to client i.e. _hadoop-common_
> Logs are exactly same on both the tomcat scenario. Refer below the added 
> logs, detailed logs will be available in ranger kms log file on cluster. 
> {code:java}
> 2024-07-25 11:35:51,452 INFO  
> org.apache.hadoop.crypto.key.kms.server.KMSExceptionsProvider: 
> [https-jsse-nio-9494-exec-2]:  Entered into toResponse =
> 2024-07-25 11:35:51,452 INFO  
> org.apache.hadoop.crypto.key.kms.server.KMSExceptionsProvider: 
> [https-jsse-nio-9494-exec-2]:  exception 
> =org.apache.hadoop.security.authorize.AuthorizationException: 
> User:systest not allowed to do 'GET_KEYS'
> 2024-07-25 11:35:51,452 INFO  
> org.apache.hadoop.crypto.key.kms.server.KMSExceptionsProvider: 
> [https-jsse-nio-9494-exec-2]:  exception.getClass() =class 
> org.apache.hadoop.security.authorize.AuthorizationException
> 2024-07-25 11:35:51,452 INFO  
> org.apache.hadoop.crypto.key.kms.server.KMSExceptionsProvider: 
> [https-jsse-nio-9494-exec-2]:  AuthorizationException =
> 2024-07-25 11:35:51,452 WARN  org.apache.hadoop.crypto.key.kms.server.KMS: 
> [https-jsse-nio-9494-exec-2]: User syst...@root.comops.site (auth:KERBEROS) 
> request GET 
> https://ccycloud-1.ss-tomcat-test1.root.comops.site:9494/kms/v1/keys/names 
> caused exception.
> org.apache.hadoop.security.authorize.AuthorizationException: User:systest not 
> allowed to do 'GET_KEYS'
> 2024-07-25 11:35:51,452 INFO  
> org.apache.hadoop.crypto.key.kms.server.KMSExceptionsProvider: 
> [https-jsse-nio-9494-exec-2]: = Entered into createResponse ==
> 2024-07-25 11:35:51,452 INFO  
> org.apache.hadoop.crypto.key.kms.server.KMSExceptionsProvider: 
> [https-jsse-nio-9494-exec-2]:  status === Forbidden
> 2024-07-25 11:35:51,452 INFO  
> org.apache.hadoop.crypto.key.kms.server

[jira] [Resolved] (HADOOP-19253) Google GCS changes fail due to VectorIO changes

2024-08-19 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19253.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> Google GCS changes fail due to VectorIO changes
> ---
>
> Key: HADOOP-19253
> URL: https://issues.apache.org/jira/browse/HADOOP-19253
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs
>Affects Versions: 3.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> the changes of HADOOP-19098 broken google gcs
> {code}
> [INFO] 
> 
> [ERROR] Failed to execute goal 
> org.apache.maven.plugins:maven-compiler-plugin:3.10.1:compile 
> (default-compile) on project gcs-connector: Compilation failure
> [ERROR] 
> /Users/mthakur/Sandbox/open_source/hadoop-connectors/gcs/src/main/java/com/google/cloud/hadoop/fs/gcs/VectoredIOImpl.java:[317,60]
>  incompatible types: java.util.List org.apache.hadoop.fs.FileRange> cannot be converted to 
> org.apache.hadoop.fs.FileRange[]
> [ERROR]
> {code}
> failing line is
> {code}
> FileRange[] sortedRanges = VectoredReadUtils.sortRanges(input);
> {code}
> need to restore the original  {{sortRanges}}, renaming the changed signature 
> one first. plus a test, obviously



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19253) Google GCS changes fail due to VectorIO changes

2024-08-16 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19253:
---

 Summary: Google GCS changes fail due to VectorIO changes
 Key: HADOOP-19253
 URL: https://issues.apache.org/jira/browse/HADOOP-19253
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 3.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran


the changes of HADOOP-19098 broken google gcs


{code}
[INFO] 
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-compiler-plugin:3.10.1:compile (default-compile) 
on project gcs-connector: Compilation failure
[ERROR] 
/Users/mthakur/Sandbox/open_source/hadoop-connectors/gcs/src/main/java/com/google/cloud/hadoop/fs/gcs/VectoredIOImpl.java:[317,60]
 incompatible types: java.util.List cannot be converted to 
org.apache.hadoop.fs.FileRange[]
[ERROR]

{code}

failing line is

{code}
FileRange[] sortedRanges = VectoredReadUtils.sortRanges(input);

{code}


need to restore the original  {{sortRanges}}, renaming the changed signature 
one first. plus a test, obviously



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19136) Upgrade commons-io to 2.16.1

2024-08-16 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19136.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> Upgrade commons-io to 2.16.1
> 
>
> Key: HADOOP-19136
> URL: https://issues.apache.org/jira/browse/HADOOP-19136
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Affects Versions: 3.4.1
>Reporter: Shilun Fan
>Assignee: Shilun Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> commons-io can be upgraded from 2.14.0 to 2.16.0, try to upgrade.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19252) Release Hadoop Third-Party 1.3.0

2024-08-16 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19252:
---

 Summary: Release Hadoop Third-Party 1.3.0
 Key: HADOOP-19252
 URL: https://issues.apache.org/jira/browse/HADOOP-19252
 Project: Hadoop Common
  Issue Type: Task
  Components: hadoop-thirdparty
Affects Versions: thirdparty-1.3.0
Reporter: Steve Loughran
Assignee: Steve Loughran


Create a release of thirdparty jar with the protobuf version compatible with 
all java8 builds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19153) hadoop-common still exports logback as a transitive dependency

2024-08-16 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19153.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> hadoop-common still exports logback as a transitive dependency
> --
>
> Key: HADOOP-19153
> URL: https://issues.apache.org/jira/browse/HADOOP-19153
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build, common
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> Even though HADOOP-19084 set out to stop it, somehow ZK's declaration of a 
> logback dependency is still contaminating the hadoop-common dependency graph, 
> so causing problems downstream.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19245) S3ABlockOutputStream no longer sends progress events in close()

2024-08-02 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19245.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> S3ABlockOutputStream no longer sends progress events in close()
> ---
>
> Key: HADOOP-19245
> URL: https://issues.apache.org/jira/browse/HADOOP-19245
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Critical
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> We don't get progress events passed through from S3ABlockOutputStream to any 
> Progress instance passed in which doesn't implement ProgressListener
> This is due to a signature mismatch between the changed ProgressableListener 
> interface and the {{S3ABlockOutputStream.ProgressListener}} impl.
> * critical because distcp jobs will timeout on large uploads without this
> * trivial to fix; does need a test



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19245) S3ABlockOutputStream no longer sends progress events in close()

2024-08-01 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19245:
---

 Summary: S3ABlockOutputStream no longer sends progress events in 
close()
 Key: HADOOP-19245
 URL: https://issues.apache.org/jira/browse/HADOOP-19245
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran
Assignee: Steve Loughran


We don't get progress events passed through from S3ABlockOutputStream to any 
Progress instance passed in which doesn't implement ProgressListener

This is due to a signature mismatch between the changed ProgressableListener 
interface and the {{S3ABlockOutputStream.ProgressListener}} impl.

* critical because distcp jobs will timeout on large uploads without this
* trivial to fix; does need a test



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19161) S3A: option "fs.s3a.performance.flags" to take list of performance flags

2024-07-29 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19161.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> S3A: option "fs.s3a.performance.flags" to take list of performance flags
> 
>
> Key: HADOOP-19161
> URL: https://issues.apache.org/jira/browse/HADOOP-19161
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> HADOOP-19072 shows we want to add more optimisations than that of 
> HADOOP-18930.
> * Extending the new optimisations to the existing option is brittle
> * Adding explicit options for each feature gets complext fast.
> Proposed
> * A new class S3APerformanceFlags keeps all the flags
> * it build this from a string[] of values, which can be extracted from 
> getConf(),
> * and it can also support a "*" option to mean "everything"
> * this class can also be handed off to hasPathCapability() and do the right 
> thing.
> Proposed optimisations
> * create file (we will hook up HADOOP-18930)
> * mkdir (HADOOP-19072)
> * delete (probe for parent path)
> * rename (probe for source path)
> We could think of more, with different names, later.
> The goal is make it possible to strip out every HTTP request we do for 
> safety/posix compliance, so applications have the option of turning off what 
> they don't need.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19229) Vector IO: have a max distance between ranges to range

2024-07-17 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19229:
---

 Summary: Vector IO: have a max distance between ranges to range
 Key: HADOOP-19229
 URL: https://issues.apache.org/jira/browse/HADOOP-19229
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.1
Reporter: Steve Loughran


vector iO has a max size to coalesce ranges, but it also needs a maximum gap 
between ranges to justify the merge.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19220) S3A : S3AInputStream positioned readFully Expectation

2024-07-08 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19220.
-
Resolution: Works for Me

it works for me and for the people who support calls would ruin my life if it 
didn't work for them. 

You have probably done something with your mocking test set up that does not 
match what s3afs does. My recommendation is: step through the failing test with 
a debugger. 

I'm not going to look at the code because the way to do anything like that 
would be to share it as a github reference. But anyway, not a jira class issue 
- Not yet, anyway. This is the kind of problem to raise on the developer 
mailing list.

For that reason, I'm going to close it as a WORKSFORME. sorry. stick the code 
on github as a gist or something and discusson the hadoop developer list. if it 
really is a bug in s3a fs code, this jira can be re-opened.

> S3A : S3AInputStream positioned readFully Expectation
> -
>
> Key: HADOOP-19220
> URL: https://issues.apache.org/jira/browse/HADOOP-19220
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Reporter: Vinay Devadiga
>Priority: Major
>
> So basically i was testing to write some unit test - for S3AInputStream 
> readFully Method 
> package org.apache.hadoop.fs.s3a;
> import java.io.EOFException;
> import java.io.FilterInputStream;
> import java.io.IOException;
> import java.io.InputStream;
> import java.net.SocketException;
> import java.net.URI;
> import java.nio.ByteBuffer;
> import java.nio.charset.Charset;
> import java.nio.charset.StandardCharsets;
> import java.util.concurrent.CompletableFuture;
> import java.util.concurrent.TimeUnit;
> import org.apache.commons.io.IOUtils;
> import org.apache.hadoop.conf.Configuration;
> import org.apache.hadoop.fs.Path;
> import org.apache.hadoop.fs.s3a.audit.impl.NoopSpan;
> import org.apache.hadoop.fs.s3a.auth.delegation.EncryptionSecrets;
> import org.apache.hadoop.util.BlockingThreadPoolExecutorService;
> import org.apache.hadoop.util.functional.CallableRaisingIOE;
> import org.assertj.core.api.Assertions;
> import org.junit.Before;
> import org.junit.Test;
> import software.amazon.awssdk.awscore.exception.AwsErrorDetails;
> import software.amazon.awssdk.awscore.exception.AwsServiceException;
> import software.amazon.awssdk.core.ResponseInputStream;
> import software.amazon.awssdk.http.AbortableInputStream;
> import software.amazon.awssdk.services.s3.S3Client;
> import software.amazon.awssdk.services.s3.model.GetObjectRequest;
> import software.amazon.awssdk.services.s3.model.GetObjectResponse;
> import static java.lang.Math.min;
> import static java.nio.charset.StandardCharsets.UTF_8;
> import static org.apache.hadoop.fs.s3a.Constants.ASYNC_DRAIN_THRESHOLD;
> import static org.apache.hadoop.fs.s3a.Constants.AWS_REGION;
> import static org.apache.hadoop.fs.s3a.Constants.FS_S3A;
> import static org.apache.hadoop.fs.s3a.Constants.MULTIPART_MIN_SIZE;
> import static org.apache.hadoop.fs.s3a.Constants.S3_CLIENT_FACTORY_IMPL;
> import static org.apache.hadoop.util.functional.FutureIO.eval;
> import static org.assertj.core.api.Assertions.assertThat;
> import static 
> org.assertj.core.api.AssertionsForClassTypes.assertThatExceptionOfType;
> import static org.mockito.ArgumentMatchers.any;
> import static org.mockito.Mockito.never;
> import static org.mockito.Mockito.verify;
> public class TestReadFullyAndPositionalRead {
> private S3AFileSystem fs;
> private S3AInputStream input;
> private S3Client s3;
> private static final String EMPTY = "";
> private static final String INPUT = "test_content";
> @Before
> public void setUp() throws IOException {
> Configuration conf = createConfiguration();
> fs = new S3AFileSystem();
> URI uri = URI.create(FS_S3A + "://" + MockS3AFileSystem.BUCKET);
> // Unset S3CSE property from config to avoid pathIOE.
> conf.unset(Constants.S3_ENCRYPTION_ALGORITHM);
> fs.initialize(uri, conf);
> s3 = fs.getS3AInternals().getAmazonS3Client("mocking");
> }
> public Configuration createConfiguration() {
> Configuration conf = new Configuration();
> conf.setClass(S3_CLIENT_FACTORY_IMPL, MockS3ClientFactory.class, 
> S3ClientFactory.class);
> // use minimum multipart size for faster triggering
> conf.setLong(Constants.MULTIPART_SIZE, MULTIPART_MIN_SIZE);
> conf.setInt(Constants.S3A_BUCKET_PROBE, 1);
> // this is so stream draining is always blocking, allowing assertions 
> to be safely made without worrying about any race conditions
> conf.setInt(ASYNC_DRAIN_THRESHOLD, Integer.MAX_VALUE);
> // set the region to avoid the getBucketLocation on FS init.
> conf.set(AWS_REGION, "eu-west-

[jira] [Created] (HADOOP-19221) S3a: retry on 400 +ErrorCode RequestTimeout

2024-07-08 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19221:
---

 Summary: S3a: retry on 400 +ErrorCode RequestTimeout
 Key: HADOOP-19221
 URL: https://issues.apache.org/jira/browse/HADOOP-19221
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


if a slow block update takes too long then the connection is broken s3 side 
with an error message, as a 400 response

{code}
Your socket connection to the server was not read from or written to within the 
timeout period. Idle connections will be closed. (Service: Amazon S3; Status 
Code: 400; Error Code: RequestTimeout; Request ID:; S3 Extended Request ID:
{code}

This is recoverable and should be treated as such, either using the normal 
exception policy or maybe even throttlePolicy





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19195) Upgrade aws sdk v2 to 2.25.53

2024-07-08 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19195.
-
Fix Version/s: 3.4.1
   Resolution: Fixed

merged to 3.4 and trunk branches

Harshit, can you leave the "fix version" field blank; use target version to 
indicate which version it is aimed at. We use the fix version to track which 
versions it has actually been marged into, and for the automated release note 
generation. thanks

> Upgrade aws sdk v2 to 2.25.53
> -
>
> Key: HADOOP-19195
> URL: https://issues.apache.org/jira/browse/HADOOP-19195
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.5.0, 3.4.1
>Reporter: Harshit Gupta
>Assignee: Harshit Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> Upgrade aws sdk v2 to 2.25.53



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19205) S3A initialization/close slower than with v1 SDK

2024-07-05 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19205?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19205.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> S3A initialization/close slower than with v1 SDK
> 
>
> Key: HADOOP-19205
> URL: https://issues.apache.org/jira/browse/HADOOP-19205
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
> Attachments: Screenshot 2024-06-14 at 17.12.59.png, Screenshot 
> 2024-06-14 at 17.14.33.png
>
>
> Hive QE have observed slowdown in LLAP queries due to time to create and 
> close s3a filesystems instances. A key aspect of that is they keep closing 
> the fs instances (HIVE-27884), but looking at the profiles, the reason things 
> seem to have regressed is
> * two s3 clients are being created (sync and async)
> * these seem to take a lot of time scanning the classpath for "global 
> interceptors", which is at least an O(jars) operation; #of index entries in 
> the zip files may factor too.
> Proposed:
> * create async client on demand when the transfer manager is invoked
> * look at why passwords are being scanned for if 
> InstanceProfileCredentialsProvider is in use...that seems slow too
> SDK wishes
> * SDK maybe allow us to turn off that scan for interceptors?
> attaching screenshots of the profile. storediag snippet:
> {code}
> [001]  fs.s3a.access.key = (unset)
> [002]  fs.s3a.secret.key = (unset)
> [003]  fs.s3a.session.token = (unset)
> [004]  fs.s3a.server-side-encryption-algorithm = (unset)
> [005]  fs.s3a.server-side-encryption.key = (unset)
> [006]  fs.s3a.encryption.algorithm = (unset)
> [007]  fs.s3a.encryption.key = (unset)
> [008]  fs.s3a.aws.credentials.provider = 
> "com.amazonaws.auth.InstanceProfileCredentialsProvider" [core-site.xml]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19210) s3a: TestS3AAWSCredentialsProvider and TestS3AInputStreamRetry really slow

2024-07-02 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19210.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> s3a: TestS3AAWSCredentialsProvider and TestS3AInputStreamRetry really slow
> --
>
> Key: HADOOP-19210
> URL: https://issues.apache.org/jira/browse/HADOOP-19210
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
> Fix For: 3.5.0, 3.4.1
>
>
> Not noticed this before, but the unit tests TestS3AAWSCredentialsProvider and 
> TestS3AInputStreamRetry are so slow they will be hurting over all test 
> performance times: no integration tests will start until these are all 
> complete.
> {code}
> mvn test -T 1C -Dparallel-tests
> ...
> [INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 37.877 
> s - in org.apache.hadoop.fs.s3a.TestS3AInputStreamRetry
> ...
> [INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 
> 90.038 s - in org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider
> {code}
> The PR cuts total execution time of a 10 thread test run from 3 minutes to 
> 2:30



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19210) s3a: TestS3AAWSCredentialsProvider and TestS3AInputStreamRetry really slow

2024-06-26 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19210:
---

 Summary: s3a: TestS3AAWSCredentialsProvider and 
TestS3AInputStreamRetry really slow
 Key: HADOOP-19210
 URL: https://issues.apache.org/jira/browse/HADOOP-19210
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3, test
Affects Versions: 3.5.0
Reporter: Steve Loughran


Not noticed this before, but the unit tests TestS3AAWSCredentialsProvider and 
TestS3AInputStreamRetry are so slow they will be hurting over all test 
performance times: no integration tests will start until these are all complete.


{code}

mvn test -T 1C -Dparallel-tests

...
[INFO] Tests run: 5, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 37.877 s 
- in org.apache.hadoop.fs.s3a.TestS3AInputStreamRetry
...
[INFO] Tests run: 25, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 90.038 
s - in org.apache.hadoop.fs.s3a.TestS3AAWSCredentialsProvider
{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19194) Add test to find unshaded dependencies in the aws sdk

2024-06-24 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19194.
-
Fix Version/s: 3.4.1
   Resolution: Fixed

This highlights how many unshaded artifacts are in that bundle.jar, the one we 
use precisely to avoid classpath problems, especially with the aws sdk trying 
to dictate the jackson library

h2. Should we give up shipping it?

yes: it's tainted, things like netty are still there
no: at least jackson is shaded.

I'm not happy about slf4j or netty classes

> Add test to find unshaded dependencies in the aws sdk
> -
>
> Key: HADOOP-19194
> URL: https://issues.apache.org/jira/browse/HADOOP-19194
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Harshit Gupta
>Assignee: Harshit Gupta
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> Write a test to assess the aws sdk for unshaded artefacts on the class path 
> which might cause deployment failures. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19204) VectorIO regression: empty ranges are now rejected

2024-06-21 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19204?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19204.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> VectorIO regression: empty ranges are now rejected
> --
>
> Key: HADOOP-19204
> URL: https://issues.apache.org/jira/browse/HADOOP-19204
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs
>Affects Versions: 3.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> The validation now rejects a readvectored with an empty range, whereas before 
> it was a no-op
> Proposed fix, return the empty list; add test



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19203) WrappedIO BulkDelete API to raise IOEs as UncheckedIOExceptions

2024-06-20 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19203.
-
Fix Version/s: 3.4.1
   Resolution: Fixed

> WrappedIO BulkDelete API to raise IOEs as UncheckedIOExceptions
> ---
>
> Key: HADOOP-19203
> URL: https://issues.apache.org/jira/browse/HADOOP-19203
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 3.4.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> It's easier to invoke methods through reflection through parquet/iceberg 
> DynMethods if the invoked method raises unchecked exceptions, because it 
> doesn't then rewrape the raised exception in a generic RuntimeException
> Catching the IOEs and wrapping as UncheckedIOEs makes it much easier to 
> unwrap IOEs after the invocation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18508) support multiple s3a integration test runs on same bucket in parallel

2024-06-14 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18508.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> support multiple s3a integration test runs on same bucket in parallel
> -
>
> Key: HADOOP-18508
> URL: https://issues.apache.org/jira/browse/HADOOP-18508
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.3.9
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> to have (internal, sorry) jenkins test runs work in parallel, they need to 
> share the same bucket so
> # must have a prefix for job id which is passed in to the path used for forks
> # support disabling root tests so they don't stamp on each other



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18931) FileSystem.getFileSystemClass() to log at debug the jar the .class came from

2024-06-14 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18931.
-
Fix Version/s: 3.5.0
   3.4.1
 Assignee: Viraj Jasani
   Resolution: Fixed

> FileSystem.getFileSystemClass() to log at debug the jar the .class came from
> 
>
> Key: HADOOP-18931
> URL: https://issues.apache.org/jira/browse/HADOOP-18931
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 3.3.6
>Reporter: Steve Loughran
>Assignee: Viraj Jasani
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> we want to be able to log the jar the filesystem implementation class, so 
> that we can identify which version of a module the class came from.
> this is to help track down problems where different machines in the cluster 
> or the .tar.gz bundle is out of date. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19192) Log level is WARN when fail to load native hadoop libs

2024-06-14 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19192?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19192.
-
Fix Version/s: 3.5.0
   3.4.1
 Assignee: Cheng Pan
   Resolution: Fixed

> Log level is WARN when fail to load native hadoop libs
> --
>
> Key: HADOOP-19192
> URL: https://issues.apache.org/jira/browse/HADOOP-19192
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: documentation
>Affects Versions: 3.3.6
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19205) S3A initialization/close slower than with v1 SDK

2024-06-14 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19205:
---

 Summary: S3A initialization/close slower than with v1 SDK
 Key: HADOOP-19205
 URL: https://issues.apache.org/jira/browse/HADOOP-19205
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


Hive QE have observed slowdown in LLAP queries due to time to create and close 
s3a filesystems instances. A key aspect of that is they keep closing the fs 
instances (HIVE-27884), but looking at the profiles, the reason things seem to 
have regressed is

* two s3 clients are being created (sync and async)
* these seem to take a lot of time scanning the classpath for "global 
interceptors", which is at least an O(jars) operation; #of index entries in the 
zip files may factor too.

Proposed:
* create async client on demand when the transfer manager is invoked
* look at why passwords are being scanned for if 
InstanceProfileCredentialsProvider is in use...that seems slow too

SDK wishes
* SDK maybe allow us to turn off that scan for interceptors?

attaching screenshots of the profile. storediag snippet:
{code}

[001]  fs.s3a.access.key = (unset)
[002]  fs.s3a.secret.key = (unset)
[003]  fs.s3a.session.token = (unset)
[004]  fs.s3a.server-side-encryption-algorithm = (unset)
[005]  fs.s3a.server-side-encryption.key = (unset)
[006]  fs.s3a.encryption.algorithm = (unset)
[007]  fs.s3a.encryption.key = (unset)
[008]  fs.s3a.aws.credentials.provider = 
"com.amazonaws.auth.InstanceProfileCredentialsProvider" [core-site.xml]

{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19204) VectorIO regression: empty ranges are now rejected

2024-06-12 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19204:
---

 Summary: VectorIO regression: empty ranges are now rejected
 Key: HADOOP-19204
 URL: https://issues.apache.org/jira/browse/HADOOP-19204
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 3.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran


The validation now rejects a readvectored with an empty range, whereas before 
it was a no-op

Proposed fix, return the empty list; add test





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19203) WrappedIO BulkDelete API to raise iOEs as UncheckediOExceptions

2024-06-12 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19203:
---

 Summary: WrappedIO BulkDelete API to raise iOEs as 
UncheckediOExceptions
 Key: HADOOP-19203
 URL: https://issues.apache.org/jira/browse/HADOOP-19203
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs
Affects Versions: 3.4.1
Reporter: Steve Loughran



It's easier to invoke methods through reflection through parquet/iceberg 
DynMethods if the invoked method raises unchecked exceptions, because it 
doesn't then rewrape the raised exception in a generic RuntimeException

Catching the IOEs and wrapping as UncheckedIOEs makes it much easier to unwrap 
IOEs after the invocation



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19199) Include FileStatus when opening a file from FileSystem

2024-06-10 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19199.
-
Resolution: Duplicate

Closing as a duplicate of HADOOP-15229. 

I absolutely agree the head request are needless. Which is why we added exactly 
the feature you wanted in 2019, *five years ago*. And in HADOOP-16202, you only 
need to pass in the file length, so if you can store that in your manifests, 
then you can skip the HEAD call (s3a; abfs still needs it).

The problem we have is therefore not that Hadoop library lacks this, it is that 
libraries and applications haven't taken it up. Why not? Because they want 
compile against versions of duke that are over 10 years old. Which means that 
all improvements we have done that are wasted. Although private forks can do 
this, it's a very hard to get this taken up consistently, and people like you 
and I suffer in wasted time and money.

What can be done? Well, I have concluded that trying to get the projects 
upgrade doesn't work, and waiting for the libraries to "get up-to-date" is a 
moving target as we are always trying to improve in this area. Instead, all our 
new work is being targeted at being "reflection-friendly" and expecting the 
initial take-up to be through reflection. In HADOOP-19131 I am exporting the 
existing openFile() API (which takes a builder and returns and asynchronously 
evaluated input stream) as an easy-to-reflect function

{code}
public static FSDataInputStream fileSystem_openFile(
  final FileSystem fs,
  final Path path,
  final String policy,
  final FileStatus status,
  final Long length,
  final Map options) throws IOException {
{code}

The "policy" is also critical as it tells the storage layer what access policy 
you want, such as random or sequential. I'm going to add an explicit "parquet" 
policy here too, which hence to the library that footer caching would be good.

What can you do then? Other than just waiting for this to happen? Help us get 
this through the stack. We need it in: parquet, iceberg, spark, avro. 

Can you start by reviewing HADOOP-19131 and seeing how well you think it will 
integrate *and anything you can do in terms of Proof of Concept PRs using this 
patch*, so we can identify problems before the hadoop patch is merged.


> Include FileStatus when opening a file from FileSystem
> --
>
> Key: HADOOP-19199
> URL: https://issues.apache.org/jira/browse/HADOOP-19199
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 3.4.0
>Reporter: Oliver Caballero Alvarez
>Priority: Major
>  Labels: pull-request-available
>
> The FileSystem abstract class prevents that if you have information about the 
> FileStatus of a file, you use it to open that file, which means that in the 
> implementations of the open method, they have to request the FileStatus of 
> the same file again, making unnecessary requests.
> A very clear example is seen in today's latest version of the parquet-hadoop 
> implementation, where:
> https://github.com/apache/parquet-java/blob/apache-parquet-1.14.0/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopInputFile.java
> Although to create the implementation you had to consult the file to know its 
> FileStatus, when opening it only the path is included, since the FileSystem 
> implementation is the only thing it allows you to do. This implies that the 
> implementation will surely, in its open function, verify that the file exists 
> or what information the file has and perform the same operation again to 
> collect the FileStatus.
>  
> This would simply be resolved by taking the latest current version:
>  
> [https://github.com/apache/hadoop/blob/release-3.4.0-RC3/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/fs/FileSystem.java]
> and including the following:
>  
>   public FSDataInputStream open(FileStatus f) throws IOException {
>         return this.open(f.getPath(), 
> this.getConf().getInt("io.file.buffer.size", 4096));
>     }
>  
> This would imply that it is backward compatible with all current Filesystems, 
> but since it is in the implementation it could be used when this information 
> is already known.
>  
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19200) Reduce the number of headObject when opening a file with the s3 file system

2024-06-10 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19200.
-
Resolution: Duplicate

> Reduce the number of headObject when opening a file with the s3 file system
> ---
>
> Key: HADOOP-19200
> URL: https://issues.apache.org/jira/browse/HADOOP-19200
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3
>Affects Versions: 3.4.0, 3.3.6
>Reporter: Oliver Caballero Alvarez
>Priority: Major
>
> In the implementation of the S3 filesystem, of the hadoop aws package, if you 
> use it with spark, every time you open a file for anything you will have to 
> send two Head Objects, since to open the file, you will first look to see if 
> this file exists, executing a HeadObject, and then when opening it, the 
> implementation, both of sdk1 and sdk2, forces you to make a head object 
> again. This is not the fault of the implementation of this class 
> (S3AFileSystem), but of the abstract FileSystem class of the Hadoop core, 
> since it does not allow the FileStatus to be passed but only allows the use 
> of Path.
> If the FileSystem implementation is changed, it could be used to not have to 
> request that HeadObject again.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18516) [ABFS]: Support fixed SAS token config in addition to Custom SASTokenProvider Implementation

2024-06-07 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18516.
-
Fix Version/s: 3.4.1
   Resolution: Fixed

> [ABFS]: Support fixed SAS token config in addition to Custom SASTokenProvider 
> Implementation
> 
>
> Key: HADOOP-18516
> URL: https://issues.apache.org/jira/browse/HADOOP-18516
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Sree Bhattacharyya
>Assignee: Anuj Modi
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> This PR introduces a new configuration for Fixed SAS Tokens: 
> *"fs.azure.sas.fixed.token"*
> Using this new configuration, users can configure a fixed SAS Token in the 
> account settings files itself. Ideally, this should be used with SAS Tokens 
> that are scoped at a container or account level (Service or Account SAS), 
> which can be considered to be a constant for one account or container, over 
> multiple operations.
> The other method of using a SAS Token remains valid as well, where a user 
> provides a custom implementation of the SASTokenProvider interface, using 
> which a SAS Token are obtained.
> When an Account SAS Token is configured as the fixed SAS Token, and it is 
> used, it is ensured that operations are within the scope of the SAS Token.
> The code checks for whether the fixed token and the token provider class 
> implementation are configured. In the case of both being set, preference is 
> given to the custom SASTokenProvider implementation. It must be noted that if 
> such an implementation provides a SAS Token which has a lower scope than 
> Account SAS, some filesystem and service level operations might be out of 
> scope and may not succeed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19178) WASB Driver Deprecation and eventual removal

2024-06-07 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19178.
-
Fix Version/s: 3.3.9
   3.5.0
 Assignee: Anuj Modi  (was: Sneha Vijayarajan)
   Resolution: Fixed

> WASB Driver Deprecation and eventual removal
> 
>
> Key: HADOOP-19178
> URL: https://issues.apache.org/jira/browse/HADOOP-19178
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Sneha Vijayarajan
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> *WASB Driver*
> WASB driver was developed to support FNS (FlatNameSpace) Azure Storage 
> accounts. FNS accounts do not honor File-Folder syntax. HDFS Folder 
> operations hence are mimicked at client side by WASB driver and certain 
> folder operations like Rename and Delete can lead to lot of IOPs with 
> client-side enumeration and orchestration of rename/delete operation blob by 
> blob. It was not ideal for other APIs too as initial checks for path is a 
> file or folder needs to be done over multiple metadata calls. These led to a 
> degraded performance.
> To provide better service to Analytics customers, Microsoft released ADLS 
> Gen2 which are HNS (Hierarchical Namespace) , i.e File-Folder aware store. 
> ABFS driver was designed to overcome the inherent deficiencies of WASB and 
> customers were informed to migrate to ABFS driver.
> *Customers who still use the legacy WASB driver and the challenges they face* 
> Some of our customers have not migrated to the ABFS driver yet and continue 
> to use the legacy WASB driver with FNS accounts.  
> These customers face the following challenges: 
>  * They cannot leverage the optimizations and benefits of the ABFS driver.
>  * They need to deal with the compatibility issues should the files and 
> folders were modified with the legacy WASB driver and the ABFS driver 
> concurrently in a phased transition situation.
>  * There are differences for supported features for FNS and HNS over ABFS 
> Driver
>  * In certain cases, they must perform a significant amount of re-work on 
> their workloads to migrate to the ABFS driver, which is available only on HNS 
> enabled accounts in a fully tested and supported scenario.
> *Deprecation plans for WASB*
> We are introducing a new feature that will enable the ABFS driver to support 
> FNS accounts (over BlobEndpoint) using the ABFS scheme. This feature will 
> enable customers to use the ABFS driver to interact with data stored in GPv2 
> (General Purpose v2) storage accounts. 
> With this feature, the customers who still use the legacy WASB driver will be 
> able to migrate to the ABFS driver without much re-work on their workloads. 
> They will however need to change the URIs from the WASB scheme to the ABFS 
> scheme. 
> Once ABFS driver has built FNS support capability to migrate WASB customers, 
> WASB driver will be declared deprecated in OSS documentation and marked for 
> removal in next major release. This will remove any ambiguity for new 
> customer onboards as there will be only one Microsoft driver for Azure 
> Storage and migrating customers will get SLA bound support for driver and 
> service, which was not guaranteed over WASB.
>  We anticipate that this feature will serve as a stepping stone for customers 
> to move to HNS enabled accounts with the ABFS driver, which is our 
> recommended stack for big data analytics on ADLS Gen2. 
> *Any Impact for* *existing customers who are using ADLS Gen2 (HNS enabled 
> account) with ABFS driver* *?*
> This feature does not impact the existing customers who are using ADLS Gen2 
> (HNS enabled account) with ABFS driver.
> They do not need to make any changes to their workloads or configurations. 
> They will still enjoy the benefits of HNS, such as atomic operations, 
> fine-grained access control, scalability, and performance. 
> *Official recommendation*
> Microsoft continues to recommend all Big Data and Analytics customers to use 
> Azure Data Lake Gen2 (ADLS Gen2) using the ABFS driver and will continue to 
> optimize this scenario in future, we believe that this new option will help 
> all those customers to transition to a supported scenario immediately, while 
> they plan to ultimately move to ADLS Gen2 (HNS enabled account).
>  *New Authentication options that a WASB to ABFS Driver migrating customer 
> will get*
> Below auth types that WASB provides will continue to work on the new FNS over 
> ABFS Driver over configuration that accepts these SAS types (similar to WASB)
>  * SharedKey
>  * Account SAS
>  * Service/Container SAS
> Below authentication types that were not supported by WASB drive

[jira] [Resolved] (HADOOP-19114) upgrade to commons-compress 1.26.1 due to cves

2024-06-07 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19114.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> upgrade to commons-compress 1.26.1 due to cves
> --
>
> Key: HADOOP-19114
> URL: https://issues.apache.org/jira/browse/HADOOP-19114
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build, CVE
>Affects Versions: 3.4.0
>Reporter: PJ Fanning
>Assignee: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> 2 recent CVEs fixed - 
> https://mvnrepository.com/artifact/org.apache.commons/commons-compress
> Important: Denial of Service CVE-2024-25710
> Moderate: Denial of Service CVE-2024-26308



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19196) Bulk delete api doesn't take the path to delete as the base path

2024-06-06 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19196:
---

 Summary: Bulk delete api doesn't take the path to delete as the 
base path
 Key: HADOOP-19196
 URL: https://issues.apache.org/jira/browse/HADOOP-19196
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs
Affects Versions: 3.5.0, 3.4.1
Reporter: Steve Loughran


If you use the path of the file you intend to delete as the base path, you get 
an error. This is because the validation requires the list to be of children, 
but the base path itself should be valid.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19193) Create orphan commit for website deployment

2024-06-05 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19193.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> Create orphan commit for website deployment
> ---
>
> Key: HADOOP-19193
> URL: https://issues.apache.org/jira/browse/HADOOP-19193
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19188) TestHarFileSystem and TestFilterFileSystem failing after bulk delete API added

2024-06-04 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19188.
-
Resolution: Fixed

> TestHarFileSystem and TestFilterFileSystem failing after bulk delete API added
> --
>
> Key: HADOOP-19188
> URL: https://issues.apache.org/jira/browse/HADOOP-19188
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs, test
>Affects Versions: 3.5.0, 3.4.1
>Reporter: Steve Loughran
>Assignee: Mukund Thakur
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> oh, we need to update a couple of tests so they know not to worry about the 
> new interface/method. The details are in the javadocs of FileSystem.
> Interesting these snuck through yetus, though they fail in PRs based atop 
> #6726
> {code}
> [ERROR] Failures: 
> [ERROR] org.apache.hadoop.fs.TestFilterFileSystem.testFilterFileSystem
> [ERROR]   Run 1: TestFilterFileSystem.testFilterFileSystem:181 1 methods were 
> not overridden correctly - see log
> [ERROR]   Run 2: TestFilterFileSystem.testFilterFileSystem:181 1 methods were 
> not overridden correctly - see log
> [ERROR]   Run 3: TestFilterFileSystem.testFilterFileSystem:181 1 methods were 
> not overridden correctly - see log
> [INFO] 
> [ERROR] org.apache.hadoop.fs.TestHarFileSystem.testInheritedMethodsImplemented
> [ERROR]   Run 1: TestHarFileSystem.testInheritedMethodsImplemented:402 1 
> methods were not overridden correctly - see log
> [ERROR]   Run 2: TestHarFileSystem.testInheritedMethodsImplemented:402 1 
> methods were not overridden correctly - see log
> [ERROR]   Run 3: TestHarFileSystem.testInheritedMethodsImplemented:402 1 
> methods were not overridden correctly - see log
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19191) Batch APIs for delete

2024-06-04 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19191.
-
Resolution: Duplicate

Fixed in HADOOP-18679; there's an iceberg PR up to use the reflection-friendly 
WrappedIO access point.

That feature will ship in hadoop 3.4.1; i would like a basic backport to 
branch-3.3 where even though the full s3a-side backport would be impossible 
(sdk versions...), we could at least offer the public API to all and the 
page-size=1 DELETE call for S3, *without any safety checks*. it'll still save 
some LIST calls and encourage adoption.

If you want to get involved there, happy to take PRs (under the original JIRA)

> Batch APIs for delete
> -
>
> Key: HADOOP-19191
> URL: https://issues.apache.org/jira/browse/HADOOP-19191
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs
>Reporter: Alkis Evlogimenos
>Priority: Major
>
> Add batch APIs with for delete to allow better performance for object stores:
> {{boolean[] delete(Path[] paths);}}
> The API should have a default implementation that delegates to the singular 
> delete. Implementations can override to provide better performance.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19189) ITestS3ACommitterFactory failing

2024-05-31 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19189:
---

 Summary: ITestS3ACommitterFactory failing
 Key: HADOOP-19189
 URL: https://issues.apache.org/jira/browse/HADOOP-19189
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3, test
Affects Versions: 3.4.0
Reporter: Steve Loughran


we've had ITestS3ACommitterFactory failing for a while, where it looks like 
changed committer settings aren't being picked up.

{code}
ERROR] ITestS3ACommitterFactory.testEverything:115->testInvalidFileBinding:165 
Expected a org.apache.hadoop.fs.s3a.commit.PathCommitException to be thrown, 
but got the result: : 
FileOutputCommitter{PathOutputCommitter{context=TaskAttemptContextImpl{JobContextImpl
{code}

I've spent some time looking at it and it is happening because the test sets 
the fileystem ref for the local test fs, and not that of the filesystem created 
by the committer, which is where the option is picked up.

i've tried to parameterize it but things are still playing up and I'm not sure 
how hard to try to fix.






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19188) TestHarFileSystem and TestFilterFileSystem failing after bulk delete API added

2024-05-27 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19188:
---

 Summary: TestHarFileSystem and TestFilterFileSystem failing after 
bulk delete API added
 Key: HADOOP-19188
 URL: https://issues.apache.org/jira/browse/HADOOP-19188
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs, test
Affects Versions: 3.5.0
Reporter: Steve Loughran
Assignee: Mukund Thakur


oh, we need to update a couple of tests so they know not to worry about the new 
interface/method. The details are in the javadocs of FileSystem.

Interesting these snuck through yetus, though they fail in PRs based atop #6726

{code}
[ERROR] Failures: 
[ERROR] org.apache.hadoop.fs.TestFilterFileSystem.testFilterFileSystem
[ERROR]   Run 1: TestFilterFileSystem.testFilterFileSystem:181 1 methods were 
not overridden correctly - see log
[ERROR]   Run 2: TestFilterFileSystem.testFilterFileSystem:181 1 methods were 
not overridden correctly - see log
[ERROR]   Run 3: TestFilterFileSystem.testFilterFileSystem:181 1 methods were 
not overridden correctly - see log
[INFO] 
[ERROR] org.apache.hadoop.fs.TestHarFileSystem.testInheritedMethodsImplemented
[ERROR]   Run 1: TestHarFileSystem.testInheritedMethodsImplemented:402 1 
methods were not overridden correctly - see log
[ERROR]   Run 2: TestHarFileSystem.testInheritedMethodsImplemented:402 1 
methods were not overridden correctly - see log
[ERROR]   Run 3: TestHarFileSystem.testInheritedMethodsImplemented:402 1 
methods were not overridden correctly - see log

{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18962) Upgrade kafka to 3.4.0

2024-05-24 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18962.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> Upgrade kafka to 3.4.0
> --
>
> Key: HADOOP-18962
> URL: https://issues.apache.org/jira/browse/HADOOP-18962
> Project: Hadoop Common
>  Issue Type: Bug
>Reporter: D M Murali Krishna Reddy
>Assignee: D M Murali Krishna Reddy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Upgrade kafka-clients to 3.4.0 to fix 
> https://nvd.nist.gov/vuln/detail/CVE-2023-25194



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19168) Upgrade Kafka Clients due to CVEs

2024-05-23 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19168?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19168.
-
Resolution: Duplicate

rohit, dupe of HADOOP-18962. let's focus on that

> Upgrade Kafka Clients due to CVEs
> -
>
> Key: HADOOP-19168
> URL: https://issues.apache.org/jira/browse/HADOOP-19168
> Project: Hadoop Common
>  Issue Type: Task
>Reporter: Rohit Kumar
>Priority: Major
>  Labels: pull-request-available
>
> Upgrade Kafka Clients due to CVEs
> CVE-2023-25194:- Affected versions of this package are vulnerable to 
> Deserialization of Untrusted Data when there are gadgets in the 
> {{{}classpath{}}}. The server will connect to the attacker's LDAP server and 
> deserialize the LDAP response, which the attacker can use to execute java 
> deserialization gadget chains on the Kafka connect server.
> CVSS Score:- 8.8(High)
> [https://nvd.nist.gov/vuln/detail/CVE-2023-25194] 
> CVE-2021-38153
> CVE-2018-17196
> Insufficient Entropy
> [https://security.snyk.io/package/maven/org.apache.kafka:kafka-clients] 
> Upgrade Kafka-Clients to 3.4.0 or higher.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19182) Upgrade kafka to 3.4.0

2024-05-23 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19182.
-
Resolution: Duplicate

> Upgrade kafka to 3.4.0
> --
>
> Key: HADOOP-19182
> URL: https://issues.apache.org/jira/browse/HADOOP-19182
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Reporter: fuchaohong
>Priority: Major
>  Labels: pull-request-available
>
> Upgrade kafka to 3.4.0 to resolve CVE-2023-25194



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19185) Improve ABFS metric integration with iOStatistics

2024-05-23 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19185:
---

 Summary: Improve ABFS metric integration with iOStatistics
 Key: HADOOP-19185
 URL: https://issues.apache.org/jira/browse/HADOOP-19185
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Reporter: Steve Loughran


Followup to HADOOP-18325 covering the outstanding comments of

https://github.com/apache/hadoop/pull/6314/files





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18325) ABFS: Add correlated metric support for ABFS operations

2024-05-23 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18325.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> ABFS: Add correlated metric support for ABFS operations
> ---
>
> Key: HADOOP-18325
> URL: https://issues.apache.org/jira/browse/HADOOP-18325
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.3
>Reporter: Anmol Asrani
>Assignee: Anmol Asrani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Add metrics related to a particular job, specific to number of total 
> requests, retried requests, retry count and others



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19163) Upgrade protobuf version to 3.25.3

2024-05-21 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19163?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19163.
-
Resolution: Fixed

done. not sure what version to tag with.

Proposed: we cut a new release of this

> Upgrade protobuf version to 3.25.3
> --
>
> Key: HADOOP-19163
> URL: https://issues.apache.org/jira/browse/HADOOP-19163
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: hadoop-thirdparty
>Reporter: Bilwa S T
>Assignee: Bilwa S T
>Priority: Major
>  Labels: pull-request-available
>  Time Spent: 20m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19181) IAMCredentialsProvider throttle failures

2024-05-20 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19181:
---

 Summary: IAMCredentialsProvider throttle failures
 Key: HADOOP-19181
 URL: https://issues.apache.org/jira/browse/HADOOP-19181
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


Tests report throttling errors in IAM being remapped to noauth and failure

Again, impala tests, but with multiple processes on same host. this means that 
HADOOP-18945 isn't sufficient as even if it ensures a singleton instance for a 
process
* it doesn't if there are many test buckets (fixable)
* it doesn't work across processes (not fixable)

we may be able to 
* use a singleton across all filesystem instances
* once we know how throttling is reported, handle it through retries + 
error/stats collection


{code}
2024-02-17T18:02:10,175  WARN [TThreadPoolServer WorkerProcess-22] 
fs.FileSystem: Failed to initialize fileystem 
s3a://impala-test-uswest2-1/test-warehouse/test_num_values_def_levels_mismatch_15b31ddb.db/too_many_def_levels:
 java.nio.file.AccessDeniedException: impala-test-uswest2-1: 
org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials 
provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider 
EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : 
software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
credentials from system settings. Access key must be specified either via 
environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
2024-02-17T18:02:10,175 ERROR [TThreadPoolServer WorkerProcess-22] 
utils.MetaStoreUtils: Got exception: java.nio.file.AccessDeniedException 
impala-test-uswest2-1: org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No 
AWS Credentials provided by TemporaryAWSCredentialsProvider 
SimpleAWSCredentialsProvider EnvironmentVariableCredentialsProvider 
IAMInstanceCredentialsProvider : 
software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
credentials from system settings. Access key must be specified either via 
environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
java.nio.file.AccessDeniedException: impala-test-uswest2-1: 
org.apache.hadoop.fs.s3a.auth.NoAuthWithAWSException: No AWS Credentials 
provided by TemporaryAWSCredentialsProvider SimpleAWSCredentialsProvider 
EnvironmentVariableCredentialsProvider IAMInstanceCredentialsProvider : 
software.amazon.awssdk.core.exception.SdkClientException: Unable to load 
credentials from system settings. Access key must be specified either via 
environment variable (AWS_ACCESS_KEY_ID) or system property (aws.accessKeyId).
at 
org.apache.hadoop.fs.s3a.AWSCredentialProviderList.maybeTranslateCredentialException(AWSCredentialProviderList.java:351)
 ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:201) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:124) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.lambda$retry$4(Invoker.java:376) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:372) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.s3a.Invoker.retry(Invoker.java:347) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$verifyBucketExists$2(S3AFileSystem.java:972)
 ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:543)
 ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:524)
 ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:445)
 ~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2748)
 ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.verifyBucketExists(S3AFileSystem.java:970)
 ~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.doBucketProbing(S3AFileSystem.java:859) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:715) 
~[hadoop-aws-3.1.1.7.2.18.0-620.jar:?]
at 
org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3452) 
~[hadoop-common-3.1.1.7.2.18.0-620.jar:?]
at org.apache.hadoop.fs.FileSystem.access$300

[jira] [Resolved] (HADOOP-19172) Upgrade aws-java-sdk to 1.12.720

2024-05-16 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19172?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19172.
-
Fix Version/s: 3.3.9
   3.5.0
   3.4.1
   Resolution: Fixed

> Upgrade aws-java-sdk to 1.12.720
> 
>
> Key: HADOOP-19172
> URL: https://issues.apache.org/jira/browse/HADOOP-19172
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build, fs/s3
>Affects Versions: 3.4.0, 3.3.6
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> Update to the latest AWS SDK, to stop anyone worrying about the ion library 
> CVE https://nvd.nist.gov/vuln/detail/CVE-2024-21634
> This isn't exposed in the s3a client, but may be used downstream. 
> on v2 sdk releases, the v1 sdk is only used during builds; 3.3.x it is shipped



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19073) WASB: Fix connection leak in FolderRenamePending

2024-05-15 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19073.
-
Resolution: Fixed

> WASB: Fix connection leak in FolderRenamePending
> 
>
> Key: HADOOP-19073
> URL: https://issues.apache.org/jira/browse/HADOOP-19073
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/azure
>Affects Versions: 3.3.6
>Reporter: xy
>Assignee: xy
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Fix connection leak in FolderRenamePending in getting bytes  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19176) S3A Xattr headers need hdfs-compatible prefix

2024-05-15 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19176:
---

 Summary: S3A Xattr headers need hdfs-compatible prefix
 Key: HADOOP-19176
 URL: https://issues.apache.org/jira/browse/HADOOP-19176
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.3.6, 3.4.0
Reporter: Steve Loughran


x3a xattr list needs a prefix compatible with hdfs or existing code which tries 
to copy attributes between stores can break

we need a prefix of {user/trusted/security/system/raw}.

now, problem: currently xattrs are used by the magic committer to propagate 
file size progress; renaming the prefix will break existing code. But as it's 
read only we could modify spark to look for both old and new values.

{code}

org.apache.hadoop.HadoopIllegalArgumentException: An XAttr name must be 
prefixed with user/trusted/security/system/raw, followed by a '.'
at org.apache.hadoop.hdfs.XAttrHelper.buildXAttr(XAttrHelper.java:77) 
at org.apache.hadoop.hdfs.DFSClient.setXAttr(DFSClient.java:2835) 
at 
org.apache.hadoop.hdfs.DistributedFileSystem$59.doCall(DistributedFileSystem.java:3106)
 
at 
org.apache.hadoop.hdfs.DistributedFileSystem$59.doCall(DistributedFileSystem.java:3102)
 
at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at 
org.apache.hadoop.hdfs.DistributedFileSystem.setXAttr(DistributedFileSystem.java:3115)
 
at org.apache.hadoop.fs.FileSystem.setXAttr(FileSystem.java:3097)

{code}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-18958) Improve UserGroupInformation debug log

2024-05-14 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-18958.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

>  Improve UserGroupInformation debug log
> ---
>
> Key: HADOOP-18958
> URL: https://issues.apache.org/jira/browse/HADOOP-18958
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Affects Versions: 3.3.0, 3.3.5
>Reporter: wangzhihui
>Assignee: wangzhihui
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0
>
> Attachments: 20231029-122825-1.jpeg, 20231029-122825.jpeg, 
> 20231030-143525.jpeg, image-2023-10-29-09-47-56-489.png, 
> image-2023-10-30-14-35-11-161.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
>       Using “new Exception( )” to print the call stack of "doAs Method " in 
> the UserGroupInformation class. Using this way will print meaningless 
> Exception information and too many call stacks, This is not conducive to 
> troubleshooting
> *example:*
> !20231029-122825.jpeg|width=991,height=548!
>  
> *improved result* :
>  
> !image-2023-10-29-09-47-56-489.png|width=1099,height=156!
> !20231030-143525.jpeg|width=572,height=674!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-18958) UserGroupInformation debug log improve

2024-05-14 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-18958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reopened HADOOP-18958:
-

> UserGroupInformation debug log improve
> --
>
> Key: HADOOP-18958
> URL: https://issues.apache.org/jira/browse/HADOOP-18958
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: common
>Affects Versions: 3.3.0, 3.3.5
>Reporter: wangzhihui
>Priority: Minor
>  Labels: pull-request-available
> Attachments: 20231029-122825-1.jpeg, 20231029-122825.jpeg, 
> 20231030-143525.jpeg, image-2023-10-29-09-47-56-489.png, 
> image-2023-10-30-14-35-11-161.png
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
>       Using “new Exception( )” to print the call stack of "doAs Method " in 
> the UserGroupInformation class. Using this way will print meaningless 
> Exception information and too many call stacks, This is not conducive to 
> troubleshooting
> *example:*
> !20231029-122825.jpeg|width=991,height=548!
>  
> *improved result* :
>  
> !image-2023-10-29-09-47-56-489.png|width=1099,height=156!
> !20231030-143525.jpeg|width=572,height=674!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19175) update s3a committer docs

2024-05-14 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19175:
---

 Summary: update s3a committer docs
 Key: HADOOP-19175
 URL: https://issues.apache.org/jira/browse/HADOOP-19175
 Project: Hadoop Common
  Issue Type: Improvement
  Components: documentation, fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


Update s3a committer docs

* declare that magic committer is stable and make it the recommended one
* show how to use new command "mapred successfile" to print the success file.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19172) Upgrade aws-java-sdk to 1.12.720

2024-05-13 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19172:
---

 Summary: Upgrade aws-java-sdk to 1.12.720
 Key: HADOOP-19172
 URL: https://issues.apache.org/jira/browse/HADOOP-19172
 Project: Hadoop Common
  Issue Type: Improvement
  Components: build, fs/s3
Affects Versions: 3.3.6, 3.4.0
Reporter: Steve Loughran


Update to the latest AWS SDK, to stop anyone worrying about the ion library CVE 
https://nvd.nist.gov/vuln/detail/CVE-2024-21634

This isn't exposed in the s3a client, but may be used downstream. 

on v2 sdk releases, the v1 sdk is only used during builds; 3.3.x it is shipped



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19171) AWS v2: handle alternative forms of connection failure

2024-05-13 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19171:
---

 Summary: AWS v2: handle alternative forms of connection failure
 Key: HADOOP-19171
 URL: https://issues.apache.org/jira/browse/HADOOP-19171
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.3.6, 3.4.0
Reporter: Steve Loughran


We've had reports of network connection failures surfacing deeper in the stack 
where we don't convert to AWSApiCallTimeoutException so they aren't retried 
properly (retire connection and repeat)


{code}
Unable to execute HTTP request: Broken pipe (Write failed)
{code}


{code}
 Your socket connection to the server was not read from or written to within 
the timeout period. Idle connections will be closed. (Service: Amazon S3; 
Status Code: 400; Error Code: RequestTimeout
{code}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19161) S3A: support a comma separated list of performance flags

2024-05-02 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19161:
---

 Summary: S3A: support a comma separated list of performance flags
 Key: HADOOP-19161
 URL: https://issues.apache.org/jira/browse/HADOOP-19161
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Affects Versions: 3.4.1
Reporter: Steve Loughran
Assignee: Steve Loughran


HADOOP-19072 shows we want to add more optimisations than that of HADOOP-18930.

* Extending the new optimisations to the existing option is brittle
* Adding explicit options for each feature gets complext fast.

Proposed
* A new class S3APerformanceFlags keeps all the flags
* it build this from a string[] of values, which can be extracted from 
getConf(),
* and it can also support a "*" option to mean "everything"
* this class can also be handed off to hasPathCapability() and do the right 
thing.

Proposed optimisations
* create file (we will hook up HADOOP-18930)
* mkdir (HADOOP-19072)
* delete (probe for parent path)
* rename (probe for source path)

We could think of more, with different names, later.
The goal is make it possible to strip out every HTTP request we do for 
safety/posix compliance, so applications have the option of turning off what 
they don't need.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19146) noaa-cors-pds bucket access with global endpoint fails

2024-04-30 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19146.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> noaa-cors-pds bucket access with global endpoint fails
> --
>
> Key: HADOOP-19146
> URL: https://issues.apache.org/jira/browse/HADOOP-19146
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs/s3, test
>Affects Versions: 3.4.0
>Reporter: Viraj Jasani
>Assignee: Viraj Jasani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> All tests accessing noaa-cors-pds use us-east-1 region, as configured at 
> bucket level. If global endpoint is configured (e.g. us-west-2), they fail to 
> access to bucket.
>  
> Sample error:
> {code:java}
> org.apache.hadoop.fs.s3a.AWSRedirectException: Received permanent redirect 
> response to region [us-east-1].  This likely indicates that the S3 region 
> configured in fs.s3a.endpoint.region does not match the AWS region containing 
> the bucket.: null (Service: S3, Status Code: 301, Request ID: 
> PMRWMQC9S91CNEJR, Extended Request ID: 
> 6Xrg9thLiZXffBM9rbSCRgBqwTxdLAzm6OzWk9qYJz1kGex3TVfdiMtqJ+G4vaYCyjkqL8cteKI/NuPBQu5A0Q==)
>     at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:253)
>     at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:155)
>     at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:4041)
>     at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:3947)
>     at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$getFileStatus$26(S3AFileSystem.java:3924)
>     at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
>     at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
>     at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:449)
>     at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2716)
>     at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2735)
>     at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:3922)
>     at org.apache.hadoop.fs.Globber.getFileStatus(Globber.java:115)
>     at org.apache.hadoop.fs.Globber.doGlob(Globber.java:349)
>     at org.apache.hadoop.fs.Globber.glob(Globber.java:202)
>     at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.lambda$globStatus$35(S3AFileSystem.java:4956)
>     at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
>     at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
>     at 
> org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:449)
>     at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2716)
>     at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.trackDurationAndSpan(S3AFileSystem.java:2735)
>     at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.globStatus(S3AFileSystem.java:4949)
>     at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:313)
>     at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.listStatus(FileInputFormat.java:281)
>     at 
> org.apache.hadoop.mapreduce.lib.input.FileInputFormat.getSplits(FileInputFormat.java:445)
>     at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeNewSplits(JobSubmitter.java:311)
>     at 
> org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:328)
>     at 
> org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:201)
>     at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1677)
>     at org.apache.hadoop.mapreduce.Job$11.run(Job.java:1674)
>  {code}
> {code:java}
> Caused by: software.amazon.awssdk.services.s3.model.S3Exception: null 
> (Service: S3, Status Code: 301, Request ID: PMRWMQC9S91CNEJR, Extended 
> Request ID: 
> 6Xrg9thLiZXffBM9rbSCRgBqwTxdLAzm6OzWk9qYJz1kGex3TVfdiMtqJ+G4vaYCyjkqL8cteKI/NuPBQu5A0Q==)
>     at 
> software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156)
>     at 
> software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108)
>     at 
> software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handle(AwsXmlPredicatedResponseHandler.java:85)
>     at 
>

[jira] [Resolved] (HADOOP-19159) Fix hadoop-aws document for fs.s3a.committer.abort.pending.uploads

2024-04-29 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19159.
-
Fix Version/s: 3.3.9
   3.5.0
   3.4.1
   Resolution: Fixed

> Fix hadoop-aws document for fs.s3a.committer.abort.pending.uploads
> --
>
> Key: HADOOP-19159
> URL: https://issues.apache.org/jira/browse/HADOOP-19159
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Xi Chen
>Assignee: Xi Chen
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> The description about `fs.s3a.committer.abort.pending.uploads` in the 
> _Concurrent Jobs writing to the same destination_ is not all correct.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19158) Support delegating ByteBufferPositionedReadable to vector reads

2024-04-25 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19158:
---

 Summary: Support delegating ByteBufferPositionedReadable to vector 
reads
 Key: HADOOP-19158
 URL: https://issues.apache.org/jira/browse/HADOOP-19158
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs, fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran
Assignee: Steve Loughran


Make it easy for any stream with vector io to suppor

Specifically, 

ByteBufferPositionedReadable.readFully()

is exactly a single range read so is easy to read.

the simpler read() call which can return less isn't part of the vector API.
Proposed: invoke the readFully() but convert an EOFException to -1 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19157) [ABFS] Filesystem contract tests to use methodPath for robust parallel test runs

2024-04-23 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19157:
---

 Summary: [ABFS] Filesystem contract tests to use methodPath for 
robust parallel test runs
 Key: HADOOP-19157
 URL: https://issues.apache.org/jira/browse/HADOOP-19157
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure, test
Affects Versions: 3.4.0
Reporter: Steve Loughran
Assignee: Steve Loughran


hadoop-azure supports parallel test runs, but unlike hadoop-aws, the azure ones 
are parallelised across methods in the same test suites.

this can fail badly where contract tests have hard coded filenames and assume 
that they can use this across all test cases. Shows up when you are testing on 
a store with reduced IO capacity triggering retries and making some test cases 
slower

Fix: hadoop-common contract tests to use methodPath() names



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19102) [ABFS]: FooterReadBufferSize should not be greater than readBufferSize

2024-04-23 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19102.
-
Fix Version/s: 3.5.0
   3.4.1
   Resolution: Fixed

> [ABFS]: FooterReadBufferSize should not be greater than readBufferSize
> --
>
> Key: HADOOP-19102
> URL: https://issues.apache.org/jira/browse/HADOOP-19102
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Pranav Saxena
>Assignee: Pranav Saxena
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> The method `optimisedRead` creates a buffer array of size `readBufferSize`. 
> If footerReadBufferSize is greater than readBufferSize, abfs will attempt to 
> read more data than the buffer array can hold, which causes an exception.
> Change: To avoid this, we will keep footerBufferSize = 
> min(readBufferSizeConfig, footerBufferSizeConfig)
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19153) hadoop-common still exports logback as a transitive dependency

2024-04-17 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19153:
---

 Summary: hadoop-common still exports logback as a transitive 
dependency
 Key: HADOOP-19153
 URL: https://issues.apache.org/jira/browse/HADOOP-19153
 Project: Hadoop Common
  Issue Type: Bug
  Components: build, common
Affects Versions: 3.4.0
Reporter: Steve Loughran


Even though HADOOP-19084 set out to stop it, somehow ZK's declaration of a 
logback dependency is still contaminating the hadoop-common dependency graph, 
so causing problems downstream.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19079) HttpExceptionUtils to check that loaded class is really an exception before instantiation

2024-04-11 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19079.
-
Fix Version/s: 3.3.9
   3.5.0
   3.4.1
   Resolution: Fixed

> HttpExceptionUtils to check that loaded class is really an exception before 
> instantiation
> -
>
> Key: HADOOP-19079
> URL: https://issues.apache.org/jira/browse/HADOOP-19079
> Project: Hadoop Common
>  Issue Type: Task
>  Components: common, security
>Reporter: PJ Fanning
>Assignee: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> It can be dangerous taking class names as inputs from HTTP messages even if 
> we control the source. Issue is in HttpExceptionUtils in hadoop-common 
> (validateResponse method).
> I can provide a PR that will highlight the issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19096) [ABFS] Enhancing Client-Side Throttling Metrics Updation Logic

2024-04-11 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19096?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19096.
-
Fix Version/s: 3.5.0
   Resolution: Fixed

> [ABFS] Enhancing Client-Side Throttling Metrics Updation Logic
> --
>
> Key: HADOOP-19096
> URL: https://issues.apache.org/jira/browse/HADOOP-19096
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.1
>Reporter: Anuj Modi
>Assignee: Anuj Modi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0, 3.4.1
>
>
> ABFS has a client-side throttling mechanism which works on the metrics 
> collected from past requests made. I requests are getting failed due to 
> throttling at server, we update our metrics and client side backoff is 
> calculated based on those metrics.
> This PR enhances the logic to decide which requests should be considered to 
> compute client side backoff interval as follows:
> For each request made by ABFS driver, we will determine if they should 
> contribute to Client-Side Throttling based on the status code and result:
>  # Status code in 2xx range: Successful Operations should contribute.
>  # Status code in 3xx range: Redirection Operations should not contribute.
>  # Status code in 4xx range: User Errors should not contribute.
>  # Status code is 503: Throttling Error should contribute only if they are 
> due to client limits breach as follows:
>  ## 503, Ingress Over Account Limit: Should Contribute
>  ## 503, Egress Over Account Limit: Should Contribute
>  ## 503, TPS Over Account Limit: Should Contribute
>  ## 503, Other Server Throttling: Should not Contribute.
>  # Status code in 5xx range other than 503: Should not Contribute.
>  # IOException and UnknownHostExceptions: Should not Contribute.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19098) Vector IO: consistent specified rejection of overlapping ranges

2024-04-10 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19098.
-
Resolution: Fixed

> Vector IO: consistent specified rejection of overlapping ranges
> ---
>
> Key: HADOOP-19098
> URL: https://issues.apache.org/jira/browse/HADOOP-19098
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/s3
>Affects Versions: 3.3.6
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> Related to PARQUET-2171 q: "how do you deal with overlapping ranges?"
> I believe s3a rejects this, but the other impls may not.
> Proposed
> FS spec to say 
> * "overlap triggers IllegalArgumentException". 
> * special case: 0 byte ranges may be short circuited to return empty buffer 
> even without checking file length etc.
> Contract tests to validate this
> (+ common helper code to do this).
> I'll copy the validation stuff into the parquet PR for consistency with older 
> releases



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19101) Vectored Read into off-heap buffer broken in fallback implementation

2024-04-10 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19101.
-
Fix Version/s: 3.3.9
   3.4.1
   Resolution: Fixed

> Vectored Read into off-heap buffer broken in fallback implementation
> 
>
> Key: HADOOP-19101
> URL: https://issues.apache.org/jira/browse/HADOOP-19101
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/azure
>Affects Versions: 3.4.0, 3.3.6
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> {{VectoredReadUtils.readInDirectBuffer()}} always starts off reading at 
> position zero even when the range is at a different offset. As a result: you 
> can get incorrect information.
> Thanks for this is straightforward: we pass in a FileRange and use its offset 
> as the starting position.
> However, this does mean that all shipping releases 3.3.5-3.4.0 cannot safely 
> read vectorIO into direct buffers through HDFS, ABFS or GCS. Note that we 
> have never seen this in production because the parquet and ORC libraries both 
> read into on-heap storage.
> Those libraries needs to be audited to make sure that they never attempt to 
> read into off-heap DirectBuffers. This is a bit trickier than you would think 
> because an allocator is passed in. For PARQUET-2171 we will 
> * only invoke the API on streams which explicitly declare their support for 
> the API (so fallback in parquet itself)
> * not invoke when direct buffer allocation is in use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19144) S3A prefetching to support Vector IO

2024-04-04 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19144:
---

 Summary: S3A prefetching to support Vector IO
 Key: HADOOP-19144
 URL: https://issues.apache.org/jira/browse/HADOOP-19144
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


Add explicit support for vector IO in s3a prefetching stream.

* if a range is in 1+ cached block, it SHALL be read from cache and returned
* if a range is not in cache : TBD
* If a range is partially in cache: TBD

these are the same decisions that abfs has to make: should the client 
fetch/cache block or just do one or more GET requests

A big issue is: does caching of data fetched in a range request make any sense 
at all? Or more specifically: does fetching the blocks in which range requests 
are found make sense

Simply going to the store is a lot simpler



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19140) [ABFS, S3A] Add IORateLimiter api to hadoop common

2024-04-03 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19140:
---

 Summary: [ABFS, S3A] Add IORateLimiter api to hadoop common
 Key: HADOOP-19140
 URL: https://issues.apache.org/jira/browse/HADOOP-19140
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs, fs/azure, fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran
Assignee: Steve Loughran


Create a rate limiter API in hadoop common which code (initially, manifest 
committer, bulk delete).. can request iO capacity for a specific operation.

this can be exported by filesystems so support shared rate limiting across all 
threads

pulled from HADOOP-19093 PR



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-19115) upgrade to nimbus-jose-jwt 9.37.2 due to CVE

2024-04-02 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-19115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-19115.
-
Fix Version/s: 3.3.9
   3.5.0
   3.4.1
 Assignee: PJ Fanning
   Resolution: Fixed

> upgrade to nimbus-jose-jwt 9.37.2 due to CVE
> 
>
> Key: HADOOP-19115
> URL: https://issues.apache.org/jira/browse/HADOOP-19115
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build, CVE
>Affects Versions: 3.4.0, 3.5.0
>Reporter: PJ Fanning
>Assignee: PJ Fanning
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.9, 3.5.0, 3.4.1
>
>
> https://github.com/advisories/GHSA-gvpg-vgmx-xg6w



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-19131) Assist reflection iO with WrappedOperations class

2024-03-28 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-19131:
---

 Summary: Assist reflection iO with WrappedOperations class
 Key: HADOOP-19131
 URL: https://issues.apache.org/jira/browse/HADOOP-19131
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs, fs/azure, fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


parquet, avro etc are still stuck building with older hadoop releases. 

This makes using new APIs hard (PARQUET-2117) and means that APIs which are 5 
years old (!) such as HADOOP-15229 just aren't picked up.

This lack of openFIle() adoption hurts working with files in cloud storage as
* extra HEAD requests are made
* read policies can't be explicitly set
* split start/end can't be passed down

Proposed
# create class org.apache.hadoop.io.WrappedOperations
# add methods to wrap the apis
# test in contract tests via reflection loading -verifies we have done it 
properly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



  1   2   3   4   5   6   7   8   9   10   >