[jira] [Resolved] (HADOOP-8574) Enable starting hadoop services from inside OSGi

2021-01-21 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-8574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-8574.

Resolution: Won't Fix

> Enable starting hadoop services from inside OSGi
> 
>
> Key: HADOOP-8574
> URL: https://issues.apache.org/jira/browse/HADOOP-8574
> Project: Hadoop Common
>  Issue Type: New Feature
>Reporter: Guillaume Nodet
>Priority: Major
>
> This JIRA captures the needed things in order to start hadoop services in 
> OSGi.
> The main idea I used so far consists in:
>   * using the OSGi ConfigAdmin to store the hadoop configuration
>   * in that configuration, use a few boolean properties to determine which 
> services should be started (nameNode, dataNode ...)
>   * expose a configured url handler so that the whole OSGi runtime can use 
> urls in hdfs:/xxx
>   * the use of an OSGi ManagedService means that when the configuration 
> changes, the services will be stopped and restarted with the new configuration



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-6484) OSGI headers in jar manifest

2021-01-21 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-6484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-6484.

Resolution: Won't Fix

> OSGI headers in jar manifest
> 
>
> Key: HADOOP-6484
> URL: https://issues.apache.org/jira/browse/HADOOP-6484
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: build
>Reporter: Leen Toelen
>Priority: Major
> Attachments: HADOOP-6484.patch
>
>
> When using hadoop inside an OSGI environment one needs to change the 
> META-INF/MANIFEST.MF file to include OSGI headers (version and symbolic 
> name). It would be convenient to do this in the default build.xml. 
> There are no runtime dependencies.
> An easy way of doing this is to use the bnd ant task: 
> http://www.aqute.biz/Code/Bnd
>  
>   classpath="bnd.jar"/> 
>   classpath="src" 
>   eclipse="true" 
>   failok="false" 
>   exceptions="true" 
>   files="test.bnd"/> 
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-7977) Allow Hadoop clients and services to run in an OSGi container

2021-01-21 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-7977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-7977.

Resolution: Won't Fix

> Allow Hadoop clients and services to run in an OSGi container
> -
>
> Key: HADOOP-7977
> URL: https://issues.apache.org/jira/browse/HADOOP-7977
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: util
>Affects Versions: 0.24.0
> Environment: OSGi client runtime (Spring &c), possibly service 
> runtime (e.g. Apache Karaf)
>Reporter: Steve Loughran
>Priority: Minor
>
> There's been past discussion on running Hadoop client and service code in 
> OSGi. This JIRA issue exists to wrap up the needs and issues. 
> # client-side use of public Hadoop APIs would seem most important.
> # service-side deployments could offer benefits. The non-standard Hadoop Java 
> security configuration may interfere with this goal.
> # testing would all be functional with dependencies on external services, to 
> make things harder.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-14201) Some 2.8.0 unit tests are failing on windows

2021-01-21 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14201?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-14201.
-
Resolution: Won't Fix

> Some 2.8.0 unit tests are failing on windows
> 
>
> Key: HADOOP-14201
> URL: https://issues.apache.org/jira/browse/HADOOP-14201
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: test
>Affects Versions: 2.8.0
> Environment: Windows Server 2012.
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-14201-001.patch
>
>
> Some of the 2.8.0 tests are failing locally, without much in the way of 
> diagnostics. They may be false alarms related to system, VM setup, 
> performance, or they may be a sign of a problem.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-15894) getFileChecksum() needs to adopt S3Guard

2021-01-21 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-15894.
-
Resolution: Won't Fix

> getFileChecksum() needs to adopt S3Guard
> 
>
> Key: HADOOP-15894
> URL: https://issues.apache.org/jira/browse/HADOOP-15894
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: lqjacklee
>Priority: Minor
>
> Encountered a 404 failure in 
> {{ITestS3AMiscOperations.testNonEmptyFileChecksumsUnencrypted}}; newly 
> created file wasn't seen. Even with S3guard enabled, that method isn't doing 
> anything to query the store for it existing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-15722) regression: Hadoop 2.7.7 release breaks spark submit

2021-01-21 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-15722.
-
Resolution: Won't Fix

> regression: Hadoop 2.7.7 release breaks spark submit
> 
>
> Key: HADOOP-15722
> URL: https://issues.apache.org/jira/browse/HADOOP-15722
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build, conf, security
>Affects Versions: 2.7.7
>Reporter: Steve Loughran
>Priority: Major
>
> SPARK-25330 highlights that upgrading spark to hadoop 2.7.7 is causing a 
> regression in client setup, with things only working when 
> {{Configuration.getRestrictParserDefault(Object resource)}} = false.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-15485) reduce/tune read failure fault injection on inconsistent client

2021-01-21 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-15485.
-
Resolution: Won't Fix

S3 is now consistent; wontfix.

> reduce/tune read failure fault injection on inconsistent client
> ---
>
> Key: HADOOP-15485
> URL: https://issues.apache.org/jira/browse/HADOOP-15485
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.1.0
>Reporter: Steve Loughran
>Priority: Major
>
> If you crank up the s3guard directory inconsistency rate to stress test the 
> directory listings, then the read failure rate can go up high enough that 
> things read IO fails,.
> Maybe  that read injection should only happen for the first few seconds of a 
> stream being created, to better model delayed consistency, or least limit the 
> #of times it can surface in a stream. (This woluld imply some kind of 
> stream-specific binding)
> Otherwise: provide a way to explicitly set it, including disable



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-14766) Cloudup: an object store high performance dfs put command

2021-01-21 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-14766.
-
Resolution: Won't Fix

This lives here: https://github.com/steveloughran/cloudstore

> Cloudup: an object store high performance dfs put command
> -
>
> Key: HADOOP-14766
> URL: https://issues.apache.org/jira/browse/HADOOP-14766
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 2.8.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Attachments: HADOOP-14766-001.patch, HADOOP-14766-002.patch
>
>
> {{hdfs put local s3a://path}} is suboptimal as it treewalks down down the 
> source tree then, sequentially, copies up the file through copying the file 
> (opened as a stream) contents to a buffer, writes that to the dest file, 
> repeats.
> For S3A that hurts because
> * it;s doing the upload inefficiently: the file can be uploaded just by 
> handling the pathname to the AWS xter manager
> * it is doing it sequentially, when some parallelised upload would work. 
> * as the ordering of the files to upload is a recursive treewalk, it doesn't 
> spread the upload across multiple shards. 
> Better:
> * build the list of files to upload
> * upload in parallel, picking entries from the list at random and spreading 
> across a pool of uploaders
> * upload straight from local file (copyFromLocalFile()
> * track IO load (files created/second) to estimate risk of throttling.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17483) magic committer to be enabled for all S3 buckets

2021-01-21 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17483:
---

 Summary: magic committer to be enabled for all S3 buckets
 Key: HADOOP-17483
 URL: https://issues.apache.org/jira/browse/HADOOP-17483
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.3.0
Reporter: Steve Loughran


now that S3 is consistent, there is no need to disable the magic committer for 
safety.

remove option to enable magic committer (fs.s3a.committer.magic.enabled) and 
the associated checks/probes through the code.

May want to retain the constants and probes just for completeness/API/CLI 
consistency. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17480) S3A docs to state s3 is consistent, deprecate S3Guard

2021-01-20 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17480:
---

 Summary: S3A docs to state s3 is consistent, deprecate S3Guard
 Key: HADOOP-17480
 URL: https://issues.apache.org/jira/browse/HADOOP-17480
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.3.0
Reporter: Steve Loughran
Assignee: Steve Loughran


Review S3 docs, remove all statements of inconsistency, s3guard docs to say 
unneeded



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17473) Optimise abfs incremental listings

2021-01-20 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17473?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17473.
-
Resolution: Duplicate

> Optimise abfs incremental listings
> --
>
> Key: HADOOP-17473
> URL: https://issues.apache.org/jira/browse/HADOOP-17473
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Bilahari T H
>Priority: Minor
>
> [Uber JIRA|https://issues.apache.org/jira/browse/HADOOP-17474]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17433) Skipping network I/O in S3A getFileStatus(/) breaks ITestAssumeRole

2021-01-19 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17433.
-
Fix Version/s: 3.3.1
   Resolution: Fixed

> Skipping network I/O in S3A getFileStatus(/) breaks ITestAssumeRole
> ---
>
> Key: HADOOP-17433
> URL: https://issues.apache.org/jira/browse/HADOOP-17433
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 1h
>  Remaining Estimate: 0h
>
> Test failure in ITestAssumeRole.testAssumeRoleRestrictedPolicyFS if the test 
> bucket is unguarded. I've been playing with my bucket settings so this 
> probably didn't surface before. 
> test arguments -Dparallel-tests -DtestsThreadCount=4 -Dmarkers=keep  
> -Dfs.s3a.directory.marker.audit=true



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17476) ITestAssumeRole.testAssumeRoleBadInnerAuth failure

2021-01-18 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17476:
---

 Summary: ITestAssumeRole.testAssumeRoleBadInnerAuth failure
 Key: HADOOP-17476
 URL: https://issues.apache.org/jira/browse/HADOOP-17476
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3, test
Affects Versions: 3.3.1
Reporter: Steve Loughran


failure of test {{ITestAssumeRole.testAssumeRoleBadInnerAuth}} where a failure 
was expected, but the error text was wrong.

Either STS has changed its error text or something is changing where the 
failure happens.

Given the nature of the test, it may be simplest to keep the expectation of an 
FS init faiure, but remove the text match



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17470) Collect more S3A IOStatistics

2021-01-14 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17470:
---

 Summary: Collect more S3A IOStatistics
 Key: HADOOP-17470
 URL: https://issues.apache.org/jira/browse/HADOOP-17470
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Reporter: Steve Loughran


collect more stats of 
* how long FS API calls are taking
* what is going wrong

API
* Duration of rename(), delete() other common API Calls.
* openFile: how long the open operation is taking, split of open and openfile
* time to initialize FS instance (maybe better in FileSystem.cache?)
* create: Time to complete the create checks
* finishedWrite: time to execute finishedWrite operations (which may include 
bulk deletes)

Failure tracking: what failures are we seeing as counters
Interrupts, connections, auth failures:

This would be done in S3ARetryPolicy with an IOStatisticsStore passed in.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17471) ABFS to collect IOStatistics

2021-01-14 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17471:
---

 Summary: ABFS to collect IOStatistics
 Key: HADOOP-17471
 URL: https://issues.apache.org/jira/browse/HADOOP-17471
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/azure
Reporter: Steve Loughran




Add stats collection to ABFS FS operations, especially
* create
* open
* delete
* rename
* getFilesStatus
* list
* attribute get/set



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-16830) Add Public IOStatistics API

2021-01-14 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-16830.
-
Fix Version/s: 3.3.1
   Resolution: Fixed

Done. Created HADOOP-17469 for followup

> Add Public IOStatistics API
> ---
>
> Key: HADOOP-16830
> URL: https://issues.apache.org/jira/browse/HADOOP-16830
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: fs, fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> Applications like to collect the statistics which specific operations take, 
> by collecting exactly those operations done during the execution of FS API 
> calls by their individual worker threads, and returning these to their job 
> driver
> * S3A has a statistics API for some streams, but it's a non-standard one; 
> Impala &c can't use it
> * FileSystem storage statistics are public, but as they aren't cross-thread, 
> they don't aggregate properly
> Proposed
> # A new IOStatistics interface to serve up statistics
> # S3A to implement
> # other stores to follow
> # Pass-through from the usual wrapper classes (FS data input/output streams)
> It's hard to think about how best to offer an API for operation context 
> stats, and how to actually implement.
> ThreadLocal isn't enough because the helper threads need to update on the 
> thread local value of the instigator
> My Initial PoC doesn't address that issue, but it shows what I'm thinking of



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17469) IOStatistics Phase II

2021-01-14 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17469:
---

 Summary: IOStatistics Phase II
 Key: HADOOP-17469
 URL: https://issues.apache.org/jira/browse/HADOOP-17469
 Project: Hadoop Common
  Issue Type: New Feature
  Components: fs, fs/azure, fs/s3
Affects Versions: 3.3.1
Reporter: Steve Loughran


Continue IOStatistics development with goals of

* Easy adoption in applications
* better instrumentation in hadoop codebase (distcp?)
* more stats in abfs and s3a connectors

A key has to be a thread level context for statistics so that app code doesn't 
have to explicitly ask for the stats for each worker thread. Instead 

filesystem components update the context stats as well as thread stats (when?) 
and then apps can pick up.

* need to manage performance by minimising inefficient lookups, lock 
acquisition etc on what should be memory-only ops (read()), (write()),
* and for duration tracking, cut down on calls to System.currentTime() so that 
only 1 should be made per operation, 
* need to propagate the context into worker threads




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17455) [s3a] Intermittent failure of ITestS3ADeleteCost.testDeleteSingleFileInDir

2021-01-14 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17455.
-
Resolution: Fixed

> [s3a] Intermittent failure of ITestS3ADeleteCost.testDeleteSingleFileInDir
> --
>
> Key: HADOOP-17455
> URL: https://issues.apache.org/jira/browse/HADOOP-17455
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.3.0
>Reporter: Gabor Bota
>Assignee: Steve Loughran
>Priority: Major
> Fix For: 3.3.1
>
>
> Test failed against ireland intermittently with the following config:
> {{mvn clean verify -Dparallel-tests -DtestsThreadCount=8}}
> xml based config in auth-keys.xml:
> {code:xml}
> 
> fs.s3a.metadatastore.impl
> org.apache.hadoop.fs.s3a.s3guard.NullMetadataStore
> 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17456) S3A ITestPartialRenamesDeletes.testPartialDirDelete[bulk-delete=true] failure

2021-01-14 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17456.
-
Fix Version/s: 3.3.1
   Resolution: Fixed

> S3A ITestPartialRenamesDeletes.testPartialDirDelete[bulk-delete=true] failure
> -
>
> Key: HADOOP-17456
> URL: https://issues.apache.org/jira/browse/HADOOP-17456
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 3.3.1
>
>
> Failure in {{ITestPartialRenamesDeletes.testPartialDirDelete}}; wrong #of 
> delete requests. 
> build options: -Dparallel-tests -DtestsThreadCount=6 -Dscale -Dmarkers=delete 
> -Ds3guard -Ddynamo
> The assert fails on a line changes in HADOOP-17271; assumption being, there 
> are some test run states where things happen differently. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-13845) s3a to instrument duration of HTTP calls

2021-01-14 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-13845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-13845.
-
Resolution: Fixed

> s3a to instrument duration of HTTP calls
> 
>
> Key: HADOOP-13845
> URL: https://issues.apache.org/jira/browse/HADOOP-13845
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 2.9.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 3.3.1
>
>
> HADOOP-13844 proposes pulling out the swift duration classes for reuse; this 
> patch proposes instrumenting s3a with it.
> One interesting question: what to do with the values. For now, they could 
> just be printed, but it might be interesting to include in FS stats collected 
> at the end of a run. However, those are all assumed to be simple counters 
> where merging is a matter of addition. These are more metrics



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-14159) Add some Java-8 friendly way to work with RemoteIterable, especially listings

2021-01-13 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14159?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-14159.
-
Fix Version/s: 3.3.1
   Resolution: Duplicate

done inside  HADOOP-17450

> Add some Java-8 friendly way to work with RemoteIterable, especially listings
> -
>
> Key: HADOOP-14159
> URL: https://issues.apache.org/jira/browse/HADOOP-14159
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs
>Affects Versions: 3.0.0-alpha2
>Reporter: Steve Loughran
>Priority: Minor
> Fix For: 3.3.1
>
>
> There's a fair amount of Hadoop code which uses {{FileSystem.listStatus(path) 
> }} just to get an {{FileStatus[]}} array which they can then iterate over in 
> a {{for}} loop.
> This is inefficient and scales badly, as the entire listing is done before 
> the compute; it cannot handle directories with millions of entries. 
> The listLocatedStatus() calls return a RemoteIterator class, which can't be 
> used in for loops as it has the right to throw an IOE in any hasNext/next 
> call. That doesn't matter, as we now have closures and simple stream 
> operations.
> {code}
>  listLocatedStatus(path).filter((st) -> st.length > 0).apply(st -> 
> fs.delete(st.path))}}
> {code}
> See? We could do shiny new closure things. It wouldn't necessarily need 
> changes to FileSystem either, just something which took {{RemoteIterator}} 
> and let you chain some closures off it, similar to the java 8 streams 
> operations.
> Once implemented, we can move to using it in the Hadoop code wherever we  use 
> listFiles() today



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-14000) S3guard metadata stores to support millions of entries

2021-01-13 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14000?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-14000.
-
Resolution: Won't Fix

> S3guard metadata stores to support millions of entries
> --
>
> Key: HADOOP-14000
> URL: https://issues.apache.org/jira/browse/HADOOP-14000
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-beta1
>Reporter: Steve Loughran
>Priority: Major
>
> S3 repos can have millions of child entries
> Currently {{DirListingMetaData}} can't and {{MetadataStore.listChildren(Path 
> path)}} won't be able to handle directories that big, for listing, deleting 
> or naming.
> We will need a paged response from the listing operation, something which can 
> be iterated over.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17451) intermittent failure of S3A tests which make assertions on statistics/IOStatistics

2021-01-12 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17451.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

> intermittent failure of S3A tests which make assertions on 
> statistics/IOStatistics
> --
>
> Key: HADOOP-17451
> URL: https://issues.apache.org/jira/browse/HADOOP-17451
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Intermittent failure of ITestHuge* upload tests, when doing parallel test 
> runs.
> The count of bytes uploaded through StorageStatistics isn't updated. Maybe 
> the expected counter isn't updated, and somehow in a parallel run with 
> recycled FS instances/set up directory structure this surfaces the way it 
> doesn't in a single test run.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17272) ABFS Streams to support IOStatistics API

2021-01-12 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17272.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Merged to trunk, but I will backport when I merge the whole set of iostats 
changes into -3.3

> ABFS Streams to  support IOStatistics API
> -
>
> Key: HADOOP-17272
> URL: https://issues.apache.org/jira/browse/HADOOP-17272
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.3.1
>Reporter: Steve Loughran
>Assignee: Mehakmeet Singh
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 5h 50m
>  Remaining Estimate: 0h
>
> ABFS input/output streams to support IOStatistics API



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17461) Add thread-level IOStatistics Context

2021-01-08 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17461:
---

 Summary: Add thread-level IOStatistics Context
 Key: HADOOP-17461
 URL: https://issues.apache.org/jira/browse/HADOOP-17461
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs, fs/azure, fs/s3
Affects Versions: 3.3.1
Reporter: Steve Loughran


For effective reporting of the iostatistics of individual worker threads, we 
need a thread-level context which IO components update.

* this contact needs to be passed in two background thread forming work on 
behalf of a task.
* IO Components (streams, iterators, filesystems) need to update this context 
statistics as they perform work
* Without double counting anything.

I imagine a ThreadLocal IOStatisticContext which will be updated in the 
FileSystem API Calls. This context MUST be passed into the background threads 
used by a task, so that IO is correctly aggregated.

I don't want streams, listIterators &c to do the updating as there is more risk 
of double counting. However, we need to see their statistics if we want to know 
things like "bytes discarded in backwards seeks". And I don't want to be 
updating a shared context object on every read() call.
If all we want is store IO (HEAD, GET, DELETE, list performance etc) then the 
FS is sufficient. 
If we do want the stream-specific detail, then I propose
* caching the context in the constructor
* updating it only in close() or unbuffer() (as we do from S3AInputStream to 
S3AInstrumenation)
* excluding those we know the FS already collects.






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-14391) s3a: auto-detect region for bucket and use right endpoint

2021-01-08 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-14391?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-14391.
-
Resolution: Cannot Reproduce

Latest AWS SDK works this out if you don't set the endpoint. 

it adds the overhead of an extra HEAD / call -that is issued with v4 signature, 
if that is rejected from s3 central, the 400 error code includes the actual 
region of the bucket.

> s3a: auto-detect region for bucket and use right endpoint
> -
>
> Key: HADOOP-14391
> URL: https://issues.apache.org/jira/browse/HADOOP-14391
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0-alpha2
>Reporter: Aaron Fabbri
>Assignee: Aaron Fabbri
>Priority: Major
>
> Specifying the S3A endpoint ({{fs.s3a.endpoint}}) is
> - *required* for regions which only support v4 authentication
> - A good practice for all regions.
> The user experience of having to configure endpoints is not great.  Often it 
> is neglected and leads to additional cost, reduced performance, or failures 
> for v4 auth regions.
> I want to explore an option which, when enabled, auto-detects the region for 
> an s3 bucket and uses the proper endpoint.  Not sure if this is possible or 
> anyone has looked into it yet.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-15348) S3A Input Stream bytes read counter isn't getting through to StorageStatistics/instrumentation properly

2021-01-08 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-15348.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Fixed as part of HADOOP-17271

> S3A Input Stream bytes read counter isn't getting through to 
> StorageStatistics/instrumentation properly
> ---
>
> Key: HADOOP-15348
> URL: https://issues.apache.org/jira/browse/HADOOP-15348
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Steve Loughran
>Priority: Minor
> Fix For: 3.4.0
>
>
> TL;DR: we should have common storage statistics for bytes read and bytes 
> written, and S3A should use them in its instrumentation and have enum names 
> to match.
> # in the S3AInputStream we call 
> {{S3AInstrumentation.StreamStatistics.bytesRead(long)}}, which adds the 
> amount to {{bytesRead}}, in a read(), readFully, or forward seek() reading in 
> data
> # and in {{S3AInstrumentation.mergeInputStreamStatistics}}, that is pulled 
> into streamBytesRead.
> # which has a Statistics name of ""stream_bytes_read"
> # but that is served up in the Storage statistics as 
> "STREAM_SEEK_BYTES_READ", which is the wrong name.
> # and there isn't a common name for the counter across other filesystems.
> For now: people can use the wrong name in the enum; we may want to think 
> about retaining it when adding the correct name. And maybe add a 
> @Evolving/@LimitedPrivate scope pair to the enum



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17460) s3guard tool dumpStorageStatistics to move to IOStatistics

2021-01-08 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17460:
---

 Summary: s3guard tool dumpStorageStatistics to move to IOStatistics
 Key: HADOOP-17460
 URL: https://issues.apache.org/jira/browse/HADOOP-17460
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


S3GuardTool cli's -verbose option prints storage statistics of the FS. If it 
moves to IOStatistics it will print latencies as well as op counts



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17347) ABFS: Optimise read for small files/tails of files

2021-01-08 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17347.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

IF this is to target branch-3.3 then full test rerun. I'd like some docs too, 
maybe something on tuning the abfs connector for performance. Too many 
under-documented config options are coming in now

> ABFS: Optimise read for small files/tails of files
> --
>
> Key: HADOOP-17347
> URL: https://issues.apache.org/jira/browse/HADOOP-17347
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure
>Affects Versions: 3.4.0
>Reporter: Bilahari T H
>Assignee: Bilahari T H
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 12h 10m
>  Remaining Estimate: 0h
>
> Optimize read performance for the following scenarios
>  # Read small files completely
>  Files that are of size smaller than the read buffer size can be considered 
> as small files. In case of such files it would be better to read the full 
> file into the AbfsInputStream buffer.
>  # Read last block if the read is for footer
>  If the read is for the last 8 bytes, read the full file.
>  This will optimize reads for parquet files. [Parquet file 
> format|https://www.ellicium.com/parquet-file-format-structure/]
> Both these optimizations will be present under configs as follows
>  # fs.azure.read.smallfilescompletely
>  # fs.azure.read.optimizefooterread



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-17403) S3A ITestPartialRenamesDeletes.testRenameDirFailsInDelete failure: missing directory marker

2021-01-07 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reopened HADOOP-17403:
-

> S3A ITestPartialRenamesDeletes.testRenameDirFailsInDelete failure: missing 
> directory marker
> ---
>
> Key: HADOOP-17403
> URL: https://issues.apache.org/jira/browse/HADOOP-17403
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.3.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 3.3.1
>
>
> Seemingly transient failure of the test ITestPartialRenamesDeletes with the 
> latest HADOOP-17244 changes in: an expected directory marker was not found.
> Test run was (unintentionally) sequential, markers=delete, s3guard on
> {code}
> -Dmarkers=delete -Ds3guard -Ddynamo -Dscale 
> {code}
> Hasn't come back since.
> The bucket's retention policy was authoritative, but no dirs were declared as 
> such



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17312) S3AInputStream to be resilient to faiures in abort(); translate AWS Exceptions

2021-01-06 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17312.
-
Fix Version/s: 3.3.1
   Resolution: Duplicate

> S3AInputStream to be resilient to faiures in abort(); translate AWS Exceptions
> --
>
> Key: HADOOP-17312
> URL: https://issues.apache.org/jira/browse/HADOOP-17312
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0, 3.2.1
>Reporter: Steve Loughran
>Assignee: Yongjun Zhang
>Priority: Major
> Fix For: 3.3.1
>
>
> Stack overflow issue complaining about ConnectionClosedException during 
> S3AInputStream close(), seems triggered by an EOF exception in abort. That 
> is: we are trying to close the stream and it is failing because the stream is 
> closed. oops.
> https://stackoverflow.com/questions/64412010/pyspark-org-apache-http-connectionclosedexception-premature-end-of-content-leng
> Looking @ the stack, we aren't translating AWS exceptions in abort() to IOEs, 
> which may be a factor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-16133) S3A statistic collection underrecords bytes written in helper threads

2021-01-06 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-16133.
-
Fix Version/s: 3.3.1
   Resolution: Done

> S3A statistic collection underrecords bytes written in helper threads
> -
>
> Key: HADOOP-16133
> URL: https://issues.apache.org/jira/browse/HADOOP-16133
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.1.2
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 3.3.1
>
>
> Applications collecting per-thread statistics from S3A get underreporting of 
> bytes written, as all byte written in the worker call update those in a 
> different thread.
> Proposed: 
> * the bytes upload statistics are uploaded in the primary thread as a block 
> is queued for write, not after in the completion phase in the other thread
> * final {{WriteOperationsHelper.writeSuccessful()}} takes the final 
> statistics for its own entertainment
> Really I want context-specific storage statistics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-16468) S3AFileSystem.getContentSummary() to use listFiles(recursive)

2021-01-06 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16468?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-16468.
-
Fix Version/s: hadoop-13704
   Resolution: Duplicate

> S3AFileSystem.getContentSummary() to use listFiles(recursive)
> -
>
> Key: HADOOP-16468
> URL: https://issues.apache.org/jira/browse/HADOOP-16468
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/s3
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Priority: Major
> Fix For: hadoop-13704
>
>
> HIVE-22054 discusses how they use getContentSummary to see if a directory is 
> empty.
> This is implemented in FileSystem as a recursive treewalk, with all the costs 
> there.
> Hive is moving off it; once that is in it won't be so much of an issue. But 
> if we wanted to speed up older versions of Hive, we could move the operation 
> to using a flat list
> That would give us the file size rapidly; the directory count would have to 
> be worked out by tracking parent dirs of all paths (and all entries ending 
> with /), and adding them up



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17359) [Hadoop-Tools]S3A MultiObjectDeleteException after uploading a file

2021-01-06 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17359.
-
Resolution: Cannot Reproduce

well, if you can't reproduce it and nobody else can, closing as a 
cannot-reproduce.

if you do see it again on a recent hadoop 3.x build, reopen with stack trace 
and anything else you can collect/share

> [Hadoop-Tools]S3A MultiObjectDeleteException after uploading a file
> ---
>
> Key: HADOOP-17359
> URL: https://issues.apache.org/jira/browse/HADOOP-17359
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.10.0
>Reporter: Xun REN
>Priority: Minor
>
> Hello,
>  
> I am using org.apache.hadoop.fs.s3a.S3AFileSystem as implementation for S3 
> related operation.
> When I upload a file onto a path, it returns an error:
> {code:java}
> 20/11/05 11:49:13 ERROR s3a.S3AFileSystem: Partial failure of delete, 1 
> errors20/11/05 11:49:13 ERROR s3a.S3AFileSystem: Partial failure of delete, 1 
> errorscom.amazonaws.services.s3.model.MultiObjectDeleteException: One or more 
> objects could not be deleted (Service: null; Status Code: 200; Error Code: 
> null; Request ID: 767BEC034D0B5B8A; S3 Extended Request ID: 
> JImfJY9hCl/QvninqT9aO+jrkmyRpRcceAg7t1lO936RfOg7izIom76RtpH+5rLqvmBFRx/++g8=; 
> Proxy: null), S3 Extended Request ID: 
> JImfJY9hCl/QvninqT9aO+jrkmyRpRcceAg7t1lO936RfOg7izIom76RtpH+5rLqvmBFRx/++g8= 
> at 
> com.amazonaws.services.s3.AmazonS3Client.deleteObjects(AmazonS3Client.java:2287)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjects(S3AFileSystem.java:1137) 
> at org.apache.hadoop.fs.s3a.S3AFileSystem.removeKeys(S3AFileSystem.java:1389) 
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.deleteUnnecessaryFakeDirectories(S3AFileSystem.java:2304)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.finishedWrite(S3AFileSystem.java:2270) 
> at 
> org.apache.hadoop.fs.s3a.S3AFileSystem$WriteOperationHelper.writeSuccessful(S3AFileSystem.java:2768)
>  at 
> org.apache.hadoop.fs.s3a.S3ABlockOutputStream.close(S3ABlockOutputStream.java:371)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream$PositionCache.close(FSDataOutputStream.java:74)
>  at 
> org.apache.hadoop.fs.FSDataOutputStream.close(FSDataOutputStream.java:108) at 
> org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:69) at 
> org.apache.hadoop.io.IOUtils.copyBytes(IOUtils.java:128) at 
> org.apache.hadoop.fs.shell.CommandWithDestination$TargetFileSystem.writeStreamToFile(CommandWithDestination.java:488)
>  at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyStreamToTarget(CommandWithDestination.java:410)
>  at 
> org.apache.hadoop.fs.shell.CommandWithDestination.copyFileToTarget(CommandWithDestination.java:342)
>  at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:277)
>  at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPath(CommandWithDestination.java:262)
>  at org.apache.hadoop.fs.shell.Command.processPaths(Command.java:327) at 
> org.apache.hadoop.fs.shell.Command.processPathArgument(Command.java:299) at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processPathArgument(CommandWithDestination.java:257)
>  at org.apache.hadoop.fs.shell.Command.processArgument(Command.java:281) at 
> org.apache.hadoop.fs.shell.Command.processArguments(Command.java:265) at 
> org.apache.hadoop.fs.shell.CommandWithDestination.processArguments(CommandWithDestination.java:228)
>  at 
> org.apache.hadoop.fs.shell.CopyCommands$Put.processArguments(CopyCommands.java:285)
>  at 
> org.apache.hadoop.fs.shell.FsCommand.processRawArguments(FsCommand.java:119) 
> at org.apache.hadoop.fs.shell.Command.run(Command.java:175) at 
> org.apache.hadoop.fs.FsShell.run(FsShell.java:317) at 
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at 
> org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at 
> org.apache.hadoop.fs.FsShell.main(FsShell.java:380)20/11/05 11:49:13 ERROR 
> s3a.S3AFileSystem: bv/: "AccessDenied" - Access Denied
> {code}
> The problem is that Hadoop tries to create fake directories to map with S3 
> prefix and it cleans them after the operation. The cleaning is done from the 
> parent folder until the root folder.
> If we don't give the corresponding permission for some path, it will 
> encounter this problem:
> [https://github.com/apache/hadoop/blob/rel/release-2.10.0/hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java#L2296-L2301]
>  
> During uploading, I don't see any "fake" directories are created. Why should 
> we clean them if it is not really created ?
> It is the same for the other operations like rename or mkdir where the 
> "deleteUnnecessaryFakeDirectories" method is called.
> Maybe the solution is to check the delet

[jira] [Created] (HADOOP-17458) S3A to treat "dkClientException: Data read has a different length than the expected" as EOFException

2021-01-06 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17458:
---

 Summary: S3A to treat "dkClientException: Data read has a 
different length than the expected" as EOFException
 Key: HADOOP-17458
 URL: https://issues.apache.org/jira/browse/HADOOP-17458
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


A test run with network problems caught exceptions 
"com.amazonaws.SdkClientException: Data read has a different length than the 
expected:", which then escalated to failure.

these should be recoverable if they are recognised as such. 

translateException could do this. Yes, it would have to look @ the text, but as 
{{signifiesConnectionBroken()}} already does that for "Failed to sanitize XML 
document destined for handler class", it'd just be adding a new text string to 
look for.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17430) Restore ability to set Text to empty byte array

2021-01-05 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17430.
-
Fix Version/s: 3.3.1
   Resolution: Fixed

>  Restore ability to set Text to empty byte array
> 
>
> Key: HADOOP-17430
> URL: https://issues.apache.org/jira/browse/HADOOP-17430
> Project: Hadoop Common
>  Issue Type: Wish
>  Components: common
>Reporter: gaozhan ding
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 3.5h
>  Remaining Estimate: 0h
>
> In org.apache.hadoop.io.Text:clear() method, the comments show that we can 
> free the bytes by call set(new byte[0]), but it's not going to work now. 
> Maybe we can follow this comments.
>  
>  
> {code:java}
> // org.apache.hadoop.io.Text 
> /**
>  * Clear the string to empty.
>  *
>  * Note: For performance reasons, this call does not clear the
>  * underlying byte array that is retrievable via {@link #getBytes()}.
>  * In order to free the byte-array memory, call {@link #set(byte[])}
>  * with an empty byte array (For example, new byte[0]).
>  */
> public void clear() {
>   length = 0;
>   textLength = -1;
> }
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17456) S3A ITestPartialRenamesDeletes.testPartialDirDelete[bulk-delete=true] failure

2021-01-05 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17456:
---

 Summary: S3A 
ITestPartialRenamesDeletes.testPartialDirDelete[bulk-delete=true] failure
 Key: HADOOP-17456
 URL: https://issues.apache.org/jira/browse/HADOOP-17456
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran
Assignee: Steve Loughran


Failure in {{ITestPartialRenamesDeletes.testPartialDirDelete}}; wrong #of 
delete requests. 

build options: -Dparallel-tests -DtestsThreadCount=6 -Dscale -Dmarkers=delete 
-Ds3guard -Ddynamo

The assert fails on a line changes in HADOOP-17271; assumption being, there are 
some test run states where things happen differently. 




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17403) S3A ITestPartialRenamesDeletes failure: missing directory marker

2021-01-05 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17403.
-
Fix Version/s: 3.3.1
 Assignee: Steve Loughran
   Resolution: Workaround

> S3A ITestPartialRenamesDeletes failure: missing directory marker
> 
>
> Key: HADOOP-17403
> URL: https://issues.apache.org/jira/browse/HADOOP-17403
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3, test
>Affects Versions: 3.3.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
> Fix For: 3.3.1
>
>
> Seemingly transient failure of the test ITestPartialRenamesDeletes with the 
> latest HADOOP-17244 changes in: an expected directory marker was not found.
> Test run was (unintentionally) sequential, markers=delete, s3guard on
> {code}
> -Dmarkers=delete -Ds3guard -Ddynamo -Dscale 
> {code}
> Hasn't come back since.
> The bucket's retention policy was authoritative, but no dirs were declared as 
> such



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17451) intermittent failure of S3A huge file upload tests: count of bytes uploaded == 0

2020-12-31 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17451:
---

 Summary: intermittent failure of S3A huge file upload tests: count 
of bytes uploaded == 0
 Key: HADOOP-17451
 URL: https://issues.apache.org/jira/browse/HADOOP-17451
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


Intermittent failure of ITestHuge* upload tests, when doing parallel test runs.

The count of bytes uploaded through StorageStatistics isn't updated. Maybe the 
expected counter isn't updated, and somehow in a parallel run with recycled FS 
instances/set up directory structure this surfaces the way it doesn't in a 
single test run.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17271) S3A statistics to support IOStatistics

2020-12-31 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17271.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Merged in to trunk -will go into branch-3.3 once stabilized

> S3A statistics to support IOStatistics
> --
>
> Key: HADOOP-17271
> URL: https://issues.apache.org/jira/browse/HADOOP-17271
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 11.5h
>  Remaining Estimate: 0h
>
> S3A to rework statistics with
> * API + Implementation split of the interfaces used by subcomponents when 
> reporting stats
> * S3A Instrumentation to implement all the interfaces
> * streams, etc to all implement IOStatisticsSources and serve to callers
> * Add some tracking of durations of remote requests



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17450) hadoop-common to add IOStatistics API

2020-12-31 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17450.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

merged to trunk will backport once the ready-to-merge followup prs are in and 
all looks good

> hadoop-common to add IOStatistics API
> -
>
> Key: HADOOP-17450
> URL: https://issues.apache.org/jira/browse/HADOOP-17450
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> Add the iostatistics API to hadoop-common with
> * public interfaces for querying statistics
> * serializable snapshot
> * logging support
> and implementation support for filesystems



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17450) hadoop-common to add IOStatistics API

2020-12-30 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17450:
---

 Summary: hadoop-common to add IOStatistics API
 Key: HADOOP-17450
 URL: https://issues.apache.org/jira/browse/HADOOP-17450
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs
Affects Versions: 3.3.0
Reporter: Steve Loughran
Assignee: Steve Loughran


Add the iostatistics API to hadoop-common with

* public interfaces for querying statistics
* serializable snapshot
* logging support

and implementation support for filesystems



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17338) Intermittent S3AInputStream failures: Premature end of Content-Length delimited message body etc

2020-12-18 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17338.
-
Fix Version/s: 3.3.1
   Resolution: Fixed

> Intermittent S3AInputStream failures: Premature end of Content-Length 
> delimited message body etc
> 
>
> Key: HADOOP-17338
> URL: https://issues.apache.org/jira/browse/HADOOP-17338
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Yongjun Zhang
>Assignee: Yongjun Zhang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1
>
> Attachments: HADOOP-17338.001.patch
>
>  Time Spent: 3h 40m
>  Remaining Estimate: 0h
>
> We are seeing the following two kinds of intermittent exceptions when using 
> S3AInputSteam:
> 1.
> {code:java}
> Caused by: com.amazonaws.thirdparty.apache.http.ConnectionClosedException: 
> Premature end of Content-Length delimited message body (expected: 156463674; 
> received: 150001089
> at 
> com.amazonaws.thirdparty.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:178)
> at 
> com.amazonaws.thirdparty.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:135)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at 
> com.amazonaws.services.s3.internal.S3AbortableInputStream.read(S3AbortableInputStream.java:125)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at 
> com.amazonaws.util.LengthCheckInputStream.read(LengthCheckInputStream.java:107)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at org.apache.hadoop.fs.s3a.S3AInputStream.read(S3AInputStream.java:181)
> at java.io.DataInputStream.readFully(DataInputStream.java:195)
> at java.io.DataInputStream.readFully(DataInputStream.java:169)
> at 
> org.apache.parquet.hadoop.ParquetFileReader$ConsecutiveChunkList.readAll(ParquetFileReader.java:779)
> at 
> org.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:511)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:130)
> at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:214)
> at 
> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:227)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next(ParquetRecordReaderWrapper.java:208)
> at 
> org.apache.hadoop.hive.ql.io.parquet.read.ParquetRecordReaderWrapper.next(ParquetRecordReaderWrapper.java:63)
> at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:350)
> ... 15 more
> {code}
> 2.
> {code:java}
> Caused by: javax.net.ssl.SSLException: SSL peer shut down incorrectly
> at sun.security.ssl.InputRecord.readV3Record(InputRecord.java:596)
> at sun.security.ssl.InputRecord.read(InputRecord.java:532)
> at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:990)
> at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:948)
> at sun.security.ssl.AppInputStream.read(AppInputStream.java:105)
> at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.streamRead(SessionInputBufferImpl.java:137)
> at 
> com.amazonaws.thirdparty.apache.http.impl.io.SessionInputBufferImpl.read(SessionInputBufferImpl.java:198)
> at 
> com.amazonaws.thirdparty.apache.http.impl.io.ContentLengthInputStream.read(ContentLengthInputStream.java:176)
> at 
> com.amazonaws.thirdparty.apache.http.conn.EofSensorInputStream.read(EofSensorInputStream.java:135)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at 
> com.amazonaws.internal.SdkFilterInputStream.read(SdkFilterInputStream.java:82)
> at com.amazonaws.event.ProgressInputStream.read(ProgressInputStream.java:180)
> at 
> com.amazonaws.internal.SdkF

[jira] [Created] (HADOOP-17434) Improve S3A upload statistics collection from ProgressEvent callbacks

2020-12-15 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17434:
---

 Summary: Improve S3A upload statistics collection from 
ProgressEvent callbacks
 Key: HADOOP-17434
 URL: https://issues.apache.org/jira/browse/HADOOP-17434
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


Collection of S3A upload stats from ProgressEvent callbacks can be improved

Two similar but different implementations of listeners
* org.apache.hadoop.fs.s3a.S3ABlockOutputStream.BlockUploadProgress
* org.apache.hadoop.fs.s3a.ProgressableProgressListener. Used on simple PUT 
calls.

Both call back into S3A FS to incrementWriteOperations; BlockUploadProgress 
also updates S3AInstrumentation/IOStatistics.

* I'm not 100% confident that BlockUploadProgress is updating things 
(especially gauges of pending bytes) at the right time
* or that completion is being handled
* And the other interface doesn't update S3AInstrumentation; numbers are lost.
* And there's no incremental updating during 
{{CommitOperations.uploadFileToPendingCommit()}}, which doesn't call 
Progressable.progress() other than on every block.
* or in MultipartUploader 

Proposed: 
* a single Progress listener which updates BlockOutputStreamStatistics, used by 
all interfaces.
* WriteOperations to help set this up for callers; 
* And it's uploadPart API to take a Progressable (or the progress listener to 
use for uploading that part)
* Multipart upload API to also add a progressable...would help for distcp-like 
applications.

+Itests to verify that the gauges come out right. At the end of each operation, 
the #of bytes pending upload == 0; that of bytes uploaded == the original size





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17433) Skipping network I/O in S3A getFileStatus(/) breaks ITestAssumeRole

2020-12-14 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17433:
---

 Summary: Skipping network I/O in S3A getFileStatus(/) breaks 
ITestAssumeRole
 Key: HADOOP-17433
 URL: https://issues.apache.org/jira/browse/HADOOP-17433
 Project: Hadoop Common
  Issue Type: Bug
  Components: fs/s3, test
Affects Versions: 3.3.0
Reporter: Steve Loughran
Assignee: Mukund Thakur


Test failure in ITestAssumeRole.testAssumeRoleRestrictedPolicyFS if the test 
bucket is unguarded. I've been playing with my bucket settings so this probably 
didn't surface before. 

test arguments -Dparallel-tests -DtestsThreadCount=4 -Dmarkers=keep  
-Dfs.s3a.directory.marker.audit=true



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17412) When `fs.s3a.connection.ssl.enabled=true`, Error when visit S3A with AKSK

2020-12-14 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17412.
-
Resolution: Duplicate

This is a duplicate of HADOOP-17017; you can't have "." in your bucket name 
without enabling path style access. Please update your configuration

> When `fs.s3a.connection.ssl.enabled=true`,   Error when visit S3A with AKSK
> ---
>
> Key: HADOOP-17412
> URL: https://issues.apache.org/jira/browse/HADOOP-17412
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
> Environment: jdk 1.8
> hadoop-3.3.0
>Reporter: angerszhu
>Priority: Major
> Attachments: image-2020-12-07-10-25-51-908.png
>
>
> When we update hadoop version from hadoop-3.2.1 to hadoop-3.3.0, Use AKSK 
> access s3a with ssl enabled, then this error happen
> {code:java}
>    
> ipc.client.connection.maxidletime 
>    2 
>  
> 
> fs.s3a.secret.key 
>  
>  
>    
> fs.s3a.access.key 
> 
>  
>  
> fs.s3a.aws.credentials.provider 
> org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
> 
> {code}
> !image-2020-12-07-10-25-51-908.png!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17415) Use S3 content-range header to update length of an object during reads

2020-12-08 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17415:
---

 Summary: Use S3 content-range header to update length of an object 
during reads
 Key: HADOOP-17415
 URL: https://issues.apache.org/jira/browse/HADOOP-17415
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.3.0
Reporter: Steve Loughran


As part of all the openFile work, knowing full length of an object allows for a 
HEAD to be skipped. But: code knowing only the splits don't know the final 
length of the file.

If the content-range header is used, then as soon as a single GET is initiated 
against an object, if the field is returned then we can update the length of 
the S3A stream to its real/final length

Also: when any input stream fails with an EOF exception, we can distinguish 
stream-interrupted from "no, too far"



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17414) Magic committer files don't have the count of bytes written collected by spark

2020-12-07 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17414:
---

 Summary: Magic committer files don't have the count of bytes 
written collected by spark
 Key: HADOOP-17414
 URL: https://issues.apache.org/jira/browse/HADOOP-17414
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.2.0
Reporter: Steve Loughran
Assignee: Steve Loughran


The spark statistics tracking doesn't correctly assess the size of the uploaded 
files as it only calls getFileStatus on the zero byte objects -not the 
yet-to-manifest files.

Everything works with the staging committer purely because it's measuring the 
length of the files staged to the local FS, not the unmaterialized output.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17409) Remove S3Guard - no longer needed

2020-12-03 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17409:
---

 Summary: Remove S3Guard - no longer needed
 Key: HADOOP-17409
 URL: https://issues.apache.org/jira/browse/HADOOP-17409
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Affects Versions: 3.3.0
Reporter: Steve Loughran


With Consistent S3, S3Guard is superfluous. 

stop developing it and wean people off it as soon as they can.

Then we can worry about what to do in the code. It has gradually insinuated its 
way through the layers, especially things like multi-object delete handling 
(see HADOOP-17244). Things would be a lot simpler without it



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17332) S3A marker tool mixes up -min and -max

2020-12-03 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17332?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17332.
-
Fix Version/s: 3.3.1
   Resolution: Fixed

> S3A marker tool mixes up -min and -max
> --
>
> Key: HADOOP-17332
> URL: https://issues.apache.org/jira/browse/HADOOP-17332
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 2h
>  Remaining Estimate: 0h
>
> HADOOP-17227 manages to get -min and -max mixed up through the call chain,. 
> {code}
> hadoop s3guard markers -audit -max 2000  s3a://stevel-london/
> {code}
> leads to
> {code}
> 2020-10-27 18:11:44,434 [main] DEBUG s3guard.S3GuardTool 
> (S3GuardTool.java:main(2154)) - Exception raised
> 46: Marker count 0 out of range [2000 - 0]
>   at 
> org.apache.hadoop.fs.s3a.tools.MarkerTool$ScanResult.finish(MarkerTool.java:489)
>   at org.apache.hadoop.fs.s3a.tools.MarkerTool.run(MarkerTool.java:318)
>   at 
> org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:505)
>   at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>   at 
> org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:2134)
>   at 
> org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.main(S3GuardTool.java:2146)
> 2020-10-27 18:11:44,436 [main] INFO  util.ExitUtil 
> (ExitUtil.java:terminate(210)) - Exiting with status 46: 46: Marker count 0 
> out of range [2000 - 0]
> {code}
> Trivial fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-16879) s3a mkdirs() to not check dest for a dir marker

2020-12-03 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-16879.
-
Resolution: Duplicate

> s3a mkdirs() to not check dest for a dir marker
> ---
>
> Key: HADOOP-16879
> URL: https://issues.apache.org/jira/browse/HADOOP-16879
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.1
>Reporter: Steve Loughran
>Priority: Major
>
> S3A innerMkdirs() calls getFileStatus() to probe dest path for being a file 
> or dir, then goes to reject/no-op.
> The HEAD path + / in that code may add a 404 to the S3 load balancers, so 
> subsequent probes for the path fail. 
> Proposed: only look for file then LIST underneath
> if no entry found: probe for parent being a dir (LIST; HEAD + /), if true 
> create the marker entry. If not, start the walk (or should we then check?)
> This increases the cost of mkdir on an existing empty dir marker; reduces it 
> on a non-empty dir. Creates dir markers above dir markers to avoid those 
> cached 404s.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-16804) s3a mkdir path/ can add 404 to S3 load balancers

2020-12-03 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-16804.
-
Resolution: Not A Problem

AWS S3 is now consistent. No more 404 caching.

> s3a mkdir path/ can add 404 to S3 load balancers
> 
>
> Key: HADOOP-16804
> URL: https://issues.apache.org/jira/browse/HADOOP-16804
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Priority: Major
>
> Not seen in the wild, but inferred from a code review
> mkdirs creates an empty dir marker -but it looks for one first
> proposed
>  * only look for file marker and LIST; don't worry about an empty dir marker
>  * always PUT one there
> Saves a HEAD on every mkdir too



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-16863) Report on S3A cached 404 recovery better

2020-12-03 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-16863.
-
Resolution: Not A Problem

AWS S3 is now consistent. No more 404 caching.

> Report on S3A cached 404 recovery better
> 
>
> Key: HADOOP-16863
> URL: https://issues.apache.org/jira/browse/HADOOP-16863
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Priority: Major
>
> A big hadoop -fs copyFromLocal is showing that 404 cacheing is still 
> happening. 
> {code}
> 20/02/13 01:02:18 WARN s3a.S3AFileSystem: Failed to find file 
> s3a://dilbert/dogbert/queries_split_1/catberg.q._COPYING_. Either it is not 
> yet visible, or it has been deleted.
> 0/02/13 01:02:18 WARN s3a.S3AFileSystem: Failed to find file 
> s3a://dilbert/dogbert/queries_split_1/catberg.q._COPYING_. Either it is not 
> yet visible, or it has been deleted.
> {noformat}
> We are recovering (good) but it's (a) got the people running this code 
> worried and (b) shouldn't be happening.
> Proposed
> * error message to -> to a wiki link to a (new) doc on the topic.
> * retried clause to increment counter & if count >1 report on #of attempts 
> and duration
> * S3A FS.deleteOnExit to avoid all checks
> * and review the copyFromLocal to make sure no other probes are happening'



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17406) enable s3a magic committer by default

2020-12-02 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17406:
---

 Summary: enable s3a magic committer by default
 Key: HADOOP-17406
 URL: https://issues.apache.org/jira/browse/HADOOP-17406
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.3.0
Reporter: Steve Loughran
Assignee: Steve Loughran


Now that AWS S3 is consistent, we can safely enable the magic committer 
everywhere. Change the setting.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17403) S3A ITestPartialRenamesDeletes failure: missing directory marker

2020-11-30 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17403:
---

 Summary: S3A ITestPartialRenamesDeletes failure: missing directory 
marker
 Key: HADOOP-17403
 URL: https://issues.apache.org/jira/browse/HADOOP-17403
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.3.1
Reporter: Steve Loughran


Seemingly transient failure of the test ITestPartialRenamesDeletes with the 
latest HADOOP-17244 changes in: an expected directory marker was not found.

Test run was (unintentionally) sequential, markers=delete, s3guard on
{code}
-Dmarkers=delete -Ds3guard -Ddynamo -Dscale 
{code}

Hasn't come back since.

The bucket's retention policy was authoritative, but no dirs were declared as 
such



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-16923) mkdir on s3a should not be sensitive to trailing '/'

2020-11-30 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-16923.
-
Resolution: Duplicate

> mkdir on s3a should not be sensitive to trailing '/'
> 
>
> Key: HADOOP-16923
> URL: https://issues.apache.org/jira/browse/HADOOP-16923
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Reporter: Zoltan Haindrich
>Priority: Minor
> Fix For: 3.3.0
>
>
> I would have expected to create the directory for both calls:
> {code}
> [hive@hiveserver2-0 lib]$ hdfs dfs -mkdir 
> s3a://qe-s3-bucket-mst-xfpn-dwx-external/custom-jars2/
> /usr/bin/hdfs: line 4: /usr/lib/bigtop-utils/bigtop-detect-javahome: No such 
> file or directory
> log4j:WARN No appenders could be found for logger 
> (org.apache.hadoop.util.Shell).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> mkdir: get on s3a://qe-s3-bucket-mst-xfpn-dwx-external/custom-jars2/: 
> com.amazonaws.services.dynamodbv2.model.AmazonDynamoDBException: One or more 
> parameter values were invalid: An AttributeValue may not contain an empty 
> string (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ValidationException; Request ID: 
> 1AE0KA2Q5ADI47R75N8BDJE973VV4KQNSO5AEMVJF66Q9ASUAAJG): One or more parameter 
> values were invalid: An AttributeValue may not contain an empty string 
> (Service: AmazonDynamoDBv2; Status Code: 400; Error Code: 
> ValidationException; Request ID: 
> 1AE0KA2Q5ADI47R75N8BDJE973VV4KQNSO5AEMVJF66Q9ASUAAJG)
> [hive@hiveserver2-0 lib]$ hdfs dfs -mkdir 
> s3a://qe-s3-bucket-mst-xfpn-dwx-external/custom-jars2
> /usr/bin/hdfs: line 4: /usr/lib/bigtop-utils/bigtop-detect-javahome: No such 
> file or directory
> log4j:WARN No appenders could be found for logger 
> (org.apache.hadoop.util.Shell).
> log4j:WARN Please initialize the log4j system properly.
> log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more 
> info.
> [hive@hiveserver2-0 lib]$ 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17400) Optimize S3A for maximum performance in directory listings

2020-11-30 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17400:
---

 Summary: Optimize S3A for maximum performance in directory listings
 Key: HADOOP-17400
 URL: https://issues.apache.org/jira/browse/HADOOP-17400
 Project: Hadoop Common
  Issue Type: Improvement
  Components: fs/s3
Affects Versions: 3.3.0
Reporter: Steve Loughran






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-15844) tag S3GuardTool entry points as limitedPrivate("management-tools")/evolving

2020-11-27 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-15844.
-
Fix Version/s: 3.3.1
   Resolution: Fixed

Came in with HADOOP-13280 directory marker support

> tag S3GuardTool entry points as limitedPrivate("management-tools")/evolving
> ---
>
> Key: HADOOP-15844
> URL: https://issues.apache.org/jira/browse/HADOOP-15844
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 20m
>  Remaining Estimate: 0h
>
> The S3Guard tool static entry points are the API For management tools to work 
> with S3Guard. They need to be declared as a public API and their stability 
> "evolving" stated



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-16382) Clock skew can cause S3Guard to think object metadata is out of date

2020-11-27 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-16382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-16382.
-
Resolution: Won't Fix

we just don't get enough information back. Be nice if an http response header 
always included clock time, but, well...

> Clock skew can cause S3Guard to think object metadata is out of date
> 
>
> Key: HADOOP-16382
> URL: https://issues.apache.org/jira/browse/HADOOP-16382
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Priority: Minor
>
> When a S3Guard entry is added for an object, its last updated flag is taken 
> from the local clock: if a getFileStatus is made immediately afterwards, the 
> timestamp of the file from the HEAD may be > than the local time, so the DDB 
> entry updated.
> This is even if the clocks are *close*. When updating an entry from S3, the 
> actual timestamp of the file should be used to fix it, not local clocks



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17398) Skipping network I/O in S3A getFileStatus(/) breaks some tests

2020-11-26 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17398.
-
Fix Version/s: 3.3.1
   Resolution: Fixed

> Skipping network I/O in S3A getFileStatus(/) breaks some tests
> --
>
> Key: HADOOP-17398
> URL: https://issues.apache.org/jira/browse/HADOOP-17398
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.3.1
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 1h 20m
>  Remaining Estimate: 0h
>
> [*ERROR*]   
> *ITestS3ABucketExistence.testNoBucketProbing:65->expectUnknownStore:102 
> Expected a org.apache.hadoop.fs.s3a.UnknownStoreException to be thrown, but 
> got the result: : 
> S3AFileStatus\{path=s3a://random-bucket-57a85110-c715-4db2-a049-e43999d7e51b/;
>  isDirectory=true; modification_time=0; access_time=0; owner=mthakur; 
> group=mthakur; permission=rwxrwxrwx; isSymlink=false; hasAcl=false; 
> isEncrypted=true; isErasureCoded=false} isEmptyDirectory=UNKNOWN eTag=null 
> versionId=null*
> [*ERROR*]   *ITestS3AInconsistency.testGetFileStatus:114->Assert.fail:88 
> getFileStatus should fail due to delayed visibility.*
>  
> *ITestMarkerTool.testRunWrongBucket:227->AbstractMarkerToolTest.runToFailure:276
>  » UnknownStore*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17385) ITestS3ADeleteCost.testDirMarkersFileCreation failure

2020-11-26 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17385.
-
Fix Version/s: 3.3.1
   Resolution: Fixed

> ITestS3ADeleteCost.testDirMarkersFileCreation failure
> -
>
> Key: HADOOP-17385
> URL: https://issues.apache.org/jira/browse/HADOOP-17385
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 2h 40m
>  Remaining Estimate: 0h
>
> Test runs after apply of recent patches HADOOP-17318 and HADOOP-17244 are 
> showing a failure in ITestS3ADeleteCost.testDirMarkersFileCreation
> Either one of the patches is failing and wasn't picked up, or the combined 
> changes are conflicting in some way. Not backporting HADOOP-17318 to 
> branch-3.3 until this is addressed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17318) S3A committer to support concurrent jobs with same app attempt ID & dest dir

2020-11-26 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17318?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17318.
-
Fix Version/s: 3.3.1
   Resolution: Fixed

> S3A committer to support concurrent jobs with same app attempt ID & dest dir
> 
>
> Key: HADOOP-17318
> URL: https://issues.apache.org/jira/browse/HADOOP-17318
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 8h 10m
>  Remaining Estimate: 0h
>
> Reported failure of magic committer block uploads as pending upload ID is 
> unknown. Likely cause: it's been aborted by another job
> # Make it possible to turn off cleanup of pending uploads in magic committer
> # log more about uploads being deleted in committers
> # and upload ID in the S3aBlockOutputStream errors
> There are other concurrency issues when you look close, see SPARK-33230
> * magic committer uses app attempt ID as path under __magic; if there are 
> duplicate then they will conflict
> * staging committer local temp dir uses app attempt id
> Fix will be to have a job UUID which for spark will be picked up from the 
> SPARK-33230 changes, (option to self-generate in job setup for hadoop 3.3.1+ 
> older spark builds); fall back to app-attempt *unless that fallback has been 
> disabled*
> MR: configure to use app attempt ID
> Spark: configure to fail job setup if app attempt ID is the source of a job 
> uuid



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17313) FileSystem.get to support slow-to-instantiate FS clients

2020-11-25 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17313.
-
Fix Version/s: 3.3.1
   Resolution: Fixed

> FileSystem.get to support slow-to-instantiate FS clients
> 
>
> Key: HADOOP-17313
> URL: https://issues.apache.org/jira/browse/HADOOP-17313
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs, fs/azure, fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 4h 50m
>  Remaining Estimate: 0h
>
> A recurrent problem in processes with many worker threads (hive, spark etc) 
> is that calling `FileSystem.get(URI-to-object-store)` triggers the creation 
> and then discard of many FS clients -all but one for the same URL. As well as 
> the direct performance hit, this can exacerbate locking problems and make 
> instantiation a lot slower than it would otherwise be.
> This has been observed with the S3A and ABFS connectors.
> The ultimate solution here would probably be something more complicated to 
> ensure that only one thread was ever creating a connector for a given URL 
> -the rest would wait for it to be initialized. This would (a) reduce 
> contention & CPU, IO network load, and (b) reduce the time for all but the 
> first thread to resume processing to that of the remaining time in 
> .initialize(). This would also benefit the S3A connector.
> We'd need something like
> # A (per-user) map of filesystems being created 
> # split createFileSystem into two: instantiateFileSystem and 
> initializeFileSystem
> # each thread to instantiate the FS, put() it into the new map
> # If there was one already, discard the old one and wait for the new one to 
> be ready via a call to Object.wait()
> # If there wasn't an entry, call initializeFileSystem) and then, finally, 
> call Object.notifyAll(), and move it from the map of filesystems being 
> initialized to the map of created filesystems
> This sounds too straightforward to be that simple; the troublespots are 
> probably related to race conditions moving entries between the two maps and 
> making sure that no thread will block on the FS being initialized while it 
> has already been initialized (and so wait() will block forever).
> Rather than seek perfection, it may be safest go for a best-effort 
> optimisation of the #of FS instances created/initialized. That is: its better 
> to maybe create a few more FS instances than needed than it is to block 
> forever.
> Something is doable here, it's just not quick-and-dirty. Testing will be 
> "fun"; probably best to isolate this new logic somewhere where we can 
> simulate slow starts on one thread with many other threads waiting for it.
> A simpler option would be to have a lock on the construction process: only 
> one FS can be instantiated per user at a a time.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17325) WASB: Test failures

2020-11-23 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17325.
-
Fix Version/s: 3.3.1
   Resolution: Fixed

Ok, fixed in branches-3.3 and trunk. Thanks to all for their contributions.

as noted on the PR, I can't create a wasb a/c I can regression test with. 
Ignoring that for now

> WASB: Test failures
> ---
>
> Key: HADOOP-17325
> URL: https://issues.apache.org/jira/browse/HADOOP-17325
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/azure, test
>Affects Versions: 3.3.0
>Reporter: Sneha Vijayarajan
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> WASB tests are failing in Apache trunk resulting in Yetus run failures for 
> PRs.
>  
> ||Reason||Tests||
> |Failed junit tests|hadoop.fs.azure.TestNativeAzureFileSystemMocked|
> | |hadoop.fs.azure.TestNativeAzureFileSystemConcurrency|
> | |hadoop.fs.azure.TestWasbFsck|
> | |hadoop.fs.azure.TestNativeAzureFileSystemOperationsMocked|
> | |hadoop.fs.azure.TestNativeAzureFileSystemFileNameCheck|
> | |hadoop.fs.azure.TestNativeAzureFileSystemContractMocked|
> | |hadoop.fs.azure.TestOutOfBandAzureBlobOperations|
> | |hadoop.fs.azure.TestBlobMetadata|
> Many PRs are hit by this. Test report link from one of the PRs:
> [https://ci-hadoop.apache.org/job/hadoop-multibranch/job/PR-2368/5/testReport/]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17343) Upgrade aws-java-sdk to 1.11.901

2020-11-23 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17343.
-
Fix Version/s: 3.3.1
   Resolution: Fixed

Merged to trunk and branch-3.3; PR covers qualification testing

> Upgrade aws-java-sdk to 1.11.901
> 
>
> Key: HADOOP-17343
> URL: https://issues.apache.org/jira/browse/HADOOP-17343
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build, fs/s3
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 4h
>  Remaining Estimate: 0h
>
> Upgrade AWS SDK to most recent version



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17390) Skip license check on lz4 code files

2020-11-20 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17390.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

+1, merged to trunk. Thanks!

> Skip license check on lz4 code files
> 
>
> Key: HADOOP-17390
> URL: https://issues.apache.org/jira/browse/HADOOP-17390
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: build
>Reporter: Zhihua Deng
>Assignee: Zhihua Deng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>  Time Spent: 50m
>  Remaining Estimate: 0h
>
> There are some files against the asflicense check which will make _This 
> commit cannot be built_ when building,  this is caused by the following two 
> files:
> {noformat}
> hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/lz4/lz4.c
>   
> hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-nativetask/src/main/native/lz4/lz4.h{noformat}
> The two do not have an Apache license header, maybe we should skip check that 
> to let the buildings go.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17388) AbstractS3ATokenIdentifier to issue date in UTC

2020-11-20 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17388.
-
Fix Version/s: 3.3.1
   Resolution: Fixed

+1, merged to 3.3+

> AbstractS3ATokenIdentifier to issue date in UTC
> ---
>
> Key: HADOOP-17388
> URL: https://issues.apache.org/jira/browse/HADOOP-17388
> Project: Hadoop Common
>  Issue Type: Sub-task
>Reporter: Steve Loughran
>Assignee: Jungtaek Lim
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Follow on to HADOOP-17379: I think we should go to UTC and 
> Time.now()/System.currentTimeMillis()
> All uses of the field in hadoop-* seem to assume UTC



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17388) AbstractS3ATokenIdentifier to issue date in UTC

2020-11-19 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17388:
---

 Summary: AbstractS3ATokenIdentifier to issue date in UTC
 Key: HADOOP-17388
 URL: https://issues.apache.org/jira/browse/HADOOP-17388
 Project: Hadoop Common
  Issue Type: Bug
Reporter: Steve Loughran


Follow on to HADOOP-17379: I think we should go to UTC and 
Time.now()/System.currentTimeMillis()

All uses of the field in hadoop-* seem to assume UTC




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17386) fs.s3a.buffer.dir to be under Yarn container path on yarn applications

2020-11-18 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17386:
---

 Summary: fs.s3a.buffer.dir to be under Yarn container path on yarn 
applications
 Key: HADOOP-17386
 URL: https://issues.apache.org/jira/browse/HADOOP-17386
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.3.0
Reporter: Steve Loughran


# fs.s3a.buffer.dir defaults to hadoop.tmp.dir which is /tmp or similar
# we use this for storing file blocks during upload
# staging committers use it for all files in a task, which can be a lot more
# a lot of systems don't clean up /tmp until reboot -and if they stay up for a 
long time then they accrue files written through s3a staging committer from 
spark containers which fail

Fix: use ${env.LOCAL_DIRS:-${hadoop.tmp.dir}}/s3a as the option so that if 
env.LOCAL_DIRS is set is used over hadoop.tmp.dir. YARN-deployed apps will use 
that for the buffer dir. When the app container is destroyed, so is the 
directory.





--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17385) ITestS3ADeleteCost.testDirMarkersFileCreation failure

2020-11-18 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17385:
---

 Summary: ITestS3ADeleteCost.testDirMarkersFileCreation failure
 Key: HADOOP-17385
 URL: https://issues.apache.org/jira/browse/HADOOP-17385
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran
Assignee: Steve Loughran


Test runs after apply of recent patches HADOOP-17318 and HADOOP-17244 are 
showing a failure in ITestS3ADeleteCost.testDirMarkersFileCreation

Either one of the patches is failing and wasn't picked up, or the combined 
changes are conflicting in some way. Not backporting HADOOP-17318 to branch-3.3 
until this is addressed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17244) HADOOP-17244. S3A directory delete tombstones dir markers prematurely.

2020-11-18 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17244.
-
Resolution: Fixed

> HADOOP-17244. S3A directory delete tombstones dir markers prematurely.
> --
>
> Key: HADOOP-17244
> URL: https://issues.apache.org/jira/browse/HADOOP-17244
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 5h 10m
>  Remaining Estimate: 0h
>
> Test failure: 
> {{ITestS3AFileContextMainOperations#testRenameDirectoryAsNonExistentDirectory}}
> This is repeatable on -Dauth runs (we haven't been running them, have we?)
> Either its from the recent dir marker changes (initial hypothesis) or its 
> been lurking a while and not been picked up.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17379) AbstractS3ATokenIdentifier to set issue date == now

2020-11-17 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17379.
-
Fix Version/s: 3.3.1
   Resolution: Fixed

> AbstractS3ATokenIdentifier to set issue date == now
> ---
>
> Key: HADOOP-17379
> URL: https://issues.apache.org/jira/browse/HADOOP-17379
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Steve Loughran
>Assignee: Jungtaek Lim
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 1h 40m
>  Remaining Estimate: 0h
>
> The AbstractS3ATokenIdentifier DT identifier doesn't set the issue date to 
> the current time (unlike ABFS); this confuses spark (SPARK-33440)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-17343) Upgrade aws-java-sdk to 1.11.901

2020-11-17 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reopened HADOOP-17343:
-

> Upgrade aws-java-sdk to 1.11.901
> 
>
> Key: HADOOP-17343
> URL: https://issues.apache.org/jira/browse/HADOOP-17343
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: build, fs/s3
>Affects Versions: 3.3.1, 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Steve Loughran
>Priority: Minor
>  Labels: pull-request-available
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17380) ITestS3AContractSeek.teardown closes FS before superclass does its cleanup

2020-11-16 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17380:
---

 Summary: ITestS3AContractSeek.teardown closes FS before superclass 
does its cleanup
 Key: HADOOP-17380
 URL: https://issues.apache.org/jira/browse/HADOOP-17380
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3, test
Affects Versions: 3.3.0
Reporter: Steve Loughran


ITestS3AContractSeek.teardown closes the FS, but because it does it before 
calling super.teardown, the superclass doesn't get the opportunity to delete 
the test dirs.

Proposed: change the order. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17379) AbstractS3ATokenIdentifier to set issue date == now

2020-11-16 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17379:
---

 Summary: AbstractS3ATokenIdentifier to set issue date == now
 Key: HADOOP-17379
 URL: https://issues.apache.org/jira/browse/HADOOP-17379
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.3.0
Reporter: Steve Loughran


The AbstractS3ATokenIdentifier DT identifier doesn't set the issue date to the 
current time (unlike ABFS); this confuses spark (SPARK-33440)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17134) S3AFileSystem.listLocatedStatu(file) does a LIST even with S3Guard

2020-11-16 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17134.
-
Resolution: Won't Fix

> S3AFileSystem.listLocatedStatu(file) does a LIST even with S3Guard
> --
>
> Key: HADOOP-17134
> URL: https://issues.apache.org/jira/browse/HADOOP-17134
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Mukund Thakur
>Priority: Minor
>
> This is minor and we may want to WONTFIX; noticed during work on directory 
> markers.
> If you call listLocatedStatus(file) then a LIST call is always made to S3, 
> even when S3Guard is present and has the record to say "this is a file"
> Does this matter enough to fix? 
> # The HADOOP-16465 work moved the list before falling back to getFileStatus
> # that listing calls s3guard.listChildren(path) to list the children.
> # which only returns the chlldren of a path, not a record of the path itself.
> # so we get an empty list back, triggering the LIST
> # its only after that LIST fails that we fall back to getFileStatus and hence 
> look for the actual file record.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17376) ITestS3AContractRename failing against stricter tests

2020-11-12 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17376:
---

 Summary: ITestS3AContractRename failing against stricter tests
 Key: HADOOP-17376
 URL: https://issues.apache.org/jira/browse/HADOOP-17376
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3, test
Affects Versions: 3.3.1
Reporter: Steve Loughran


HADOOP-17365 tightened the contract test for rename over file; S3A is failing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17372) S3A AWS Credential provider loading gets confused with isolated classloaders

2020-11-10 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17372:
---

 Summary: S3A AWS Credential provider loading gets confused with 
isolated classloaders
 Key: HADOOP-17372
 URL: https://issues.apache.org/jira/browse/HADOOP-17372
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran


Problem: exception in loading S3A credentials for an FS, "Class class 
com.amazonaws.auth.EnvironmentVariableCredentialsProvider does not implement 
AWSCredentialsProvider"

Location: S3A + Spark dataframes test

Hypothesised cause:

Configuration.getClasses() uses the context classloader, and with the spark 
isolated CL that's different from the one the s3a FS uses, so it can't load AWS 
credential providers.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17066) S3A staging committer committing duplicate files

2020-11-09 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17066?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17066.
-
Resolution: Duplicate

HADOOP-17318 covers the duplicate job problem everywhere in the committer and, 
combined with a change in spark, should go away.

This is an intermittent issue as it depends on the timing you launch stages 
and, for task staging dir conflict, whether two tasks attempts of conflicting 
jobs are launched at the same time.

> S3A staging committer committing duplicate files
> 
>
> Key: HADOOP-17066
> URL: https://issues.apache.org/jira/browse/HADOOP-17066
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0, 3.2.1, 3.1.3
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>
> SPARK-39111 reporting concurrent jobs double writing files.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17366) hadoop-cloud-storage transient dependencies need review

2020-11-09 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17366:
---

 Summary: hadoop-cloud-storage transient dependencies need review
 Key: HADOOP-17366
 URL: https://issues.apache.org/jira/browse/HADOOP-17366
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: build, fs, fs/azure
Affects Versions: 3.4.0
Reporter: Steve Loughran


A review of the hadoop cloud storage dependencies shows that things are 
creeping in there

{code}
[INFO] |  +- org.apache.hadoop:hadoop-cloud-storage:jar:3.1.4:compile
[INFO] |  |  +- (org.apache.hadoop:hadoop-annotations:jar:3.1.4:compile - 
omitted for duplicate)
[INFO] |  |  +- org.apache.hadoop:hadoop-aliyun:jar:3.1.4:compile
[INFO] |  |  |  \- com.aliyun.oss:aliyun-sdk-oss:jar:3.4.1:compile
[INFO] |  |  | +- org.jdom:jdom:jar:1.1:compile
[INFO] |  |  | +- org.codehaus.jettison:jettison:jar:1.1:compile
[INFO] |  |  | |  \- stax:stax-api:jar:1.0.1:compile
[INFO] |  |  | +- com.aliyun:aliyun-java-sdk-core:jar:3.4.0:compile
[INFO] |  |  | +- com.aliyun:aliyun-java-sdk-ram:jar:3.0.0:compile
[INFO] |  |  | +- com.aliyun:aliyun-java-sdk-sts:jar:3.0.0:compile
[INFO] |  |  | \- com.aliyun:aliyun-java-sdk-ecs:jar:4.2.0:compile
[INFO] |  |  +- (org.apache.hadoop:hadoop-aws:jar:3.1.4:compile - omitted for 
duplicate)
[INFO] |  |  +- (org.apache.hadoop:hadoop-azure:jar:3.1.4:compile - omitted for 
duplicate)
[INFO] |  |  +- org.apache.hadoop:hadoop-azure-datalake:jar:3.1.4:compile
[INFO] |  |  |  \- 
com.microsoft.azure:azure-data-lake-store-sdk:jar:2.2.7:compile
[INFO] |  |  | \- (org.slf4j:slf4j-api:jar:1.7.21:compile - omitted for 
conflict with 1.7.30)
{code}

Need to review and cut things which come in hadoop-common (slf4j, maybe some of 
the allyun stuff)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17340) TestLdapGroupsMapping failing -string mismatch in exception validation

2020-11-02 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17340:
---

 Summary: TestLdapGroupsMapping failing -string mismatch in 
exception validation
 Key: HADOOP-17340
 URL: https://issues.apache.org/jira/browse/HADOOP-17340
 Project: Hadoop Common
  Issue Type: Bug
  Components: test
Affects Versions: 3.3.1
Reporter: Steve Loughran


Looks like a change in the exception strings is breaking the validation code
{code}
[ERROR]   TestLdapGroupsMapping.testLdapReadTimeout:447  Expected to find 'LDAP 
response read timed out, timeout used:4000ms' but got unexpected exception: 
javax.naming.NamingException: LDAP response read timed out, timeout used: 4000 
ms.; remaining name ''
at com.sun.jndi.ldap.LdapRequest.getReplyBer(LdapRequest.java:129)
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Reopened] (HADOOP-17312) S3AInputStream to be resilient to faiures in abort(); translate AWS Exceptions

2020-11-02 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran reopened HADOOP-17312:
-

> S3AInputStream to be resilient to faiures in abort(); translate AWS Exceptions
> --
>
> Key: HADOOP-17312
> URL: https://issues.apache.org/jira/browse/HADOOP-17312
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.0, 3.2.1
>Reporter: Steve Loughran
>Priority: Major
>
> Stack overflow issue complaining about ConnectionClosedException during 
> S3AInputStream close(), seems triggered by an EOF exception in abort. That 
> is: we are trying to close the stream and it is failing because the stream is 
> closed. oops.
> https://stackoverflow.com/questions/64412010/pyspark-org-apache-http-connectionclosedexception-premature-end-of-content-leng
> Looking @ the stack, we aren't translating AWS exceptions in abort() to IOEs, 
> which may be a factor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17335) s3a listing operation will fail in async prefetch if fs closed

2020-10-28 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17335:
---

 Summary: s3a listing operation will fail in async prefetch if fs 
closed
 Key: HADOOP-17335
 URL: https://issues.apache.org/jira/browse/HADOOP-17335
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.4.0
Reporter: Steve Loughran
Assignee: Mukund Thakur


The async prefetch logic in the S3A listing code gets into trouble if the FS 
closed and there was an async listing in progress. 

In this situation we should think about recognising and converting into some 
FS-is-closed exception



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17332) S3A marker tool mixes up -min and -max

2020-10-27 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17332:
---

 Summary: S3A marker tool mixes up -min and -max
 Key: HADOOP-17332
 URL: https://issues.apache.org/jira/browse/HADOOP-17332
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.3.1
Reporter: Steve Loughran
Assignee: Steve Loughran


HADOOP-17727 manages to get -min and -max mixed up through the call chain,. 

{code}
hadoop s3guard markers -audit -max 2000  s3a://stevel-london/
{code}
leads to
{code}
2020-10-27 18:11:44,434 [main] DEBUG s3guard.S3GuardTool 
(S3GuardTool.java:main(2154)) - Exception raised
46: Marker count 0 out of range [2000 - 0]
at 
org.apache.hadoop.fs.s3a.tools.MarkerTool$ScanResult.finish(MarkerTool.java:489)
at org.apache.hadoop.fs.s3a.tools.MarkerTool.run(MarkerTool.java:318)
at 
org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:505)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
at 
org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.run(S3GuardTool.java:2134)
at 
org.apache.hadoop.fs.s3a.s3guard.S3GuardTool.main(S3GuardTool.java:2146)
2020-10-27 18:11:44,436 [main] INFO  util.ExitUtil 
(ExitUtil.java:terminate(210)) - Exiting with status 46: 46: Marker count 0 out 
of range [2000 - 0]
{code}

Trivial fix.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17308) WASB : PageBlobOutputStream succeeding hflush even when underlying flush to storage failed

2020-10-26 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17308.
-
Fix Version/s: 3.3.1
   Resolution: Fixed

> WASB : PageBlobOutputStream succeeding hflush even when underlying flush to 
> storage failed 
> ---
>
> Key: HADOOP-17308
> URL: https://issues.apache.org/jira/browse/HADOOP-17308
> Project: Hadoop Common
>  Issue Type: Bug
>Affects Versions: 2.7.0
>Reporter: Anoop Sam John
>Assignee: Anoop Sam John
>Priority: Critical
>  Labels: HBASE, pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> In PageBlobOutputStream, write()  APIs will fill the buffer and 
> hflush/hsync/flush call will flush the buffer to underlying storage. Here the 
> Azure calls are handled in another thread 
> {code}
> private synchronized void flushIOBuffers()  {
> ...
> lastQueuedTask = new WriteRequest(outBuffer.toByteArray());
> ioThreadPool.execute(lastQueuedTask);
>   
>  }
> private class WriteRequest implements Runnable {
> private final byte[] dataPayload;
> private final CountDownLatch doneSignal = new CountDownLatch(1);
> public WriteRequest(byte[] dataPayload) {
>   this.dataPayload = dataPayload;
> }
> public void waitTillDone() throws InterruptedException {
>   doneSignal.await();
> }
> @Override
> public void run() {
>   try {
> LOG.debug("before runInternal()");
> runInternal();
> LOG.debug("after runInternal()");
>   } finally {
> doneSignal.countDown();
>   }
> }
> private void runInternal() {
>   ..
>   writePayloadToServer(rawPayload);
>   ...
> }
> private void writePayloadToServer(byte[] rawPayload) {
>   ..
>   try {
> blob.uploadPages(wrapperStream, currentBlobOffset, rawPayload.length,
> withMD5Checking(), PageBlobOutputStream.this.opContext);
>   } catch (IOException ex) {
> lastError = ex;
>   } catch (StorageException ex) {
> lastError = new IOException(ex);
>   }
>   if (lastError != null) {
> LOG.debug("Caught error in 
> PageBlobOutputStream#writePayloadToServer()");
>   }
> }
>   }
> {code}
> The flushing thread will wait for the other thread to complete the Runnable 
> WriteRequest. Thats fine. But when some exception happened while 
> blob.uploadPages, we just set that to lastError state variable.  This 
> variable is been checked for all subsequent ops like write, flush etc.  But 
> what about the current flush call? that is silently being succeeded.!!  
> In standard Azure backed HBase clusters WAL is on page blob. This issue 
> causes a serious issue in HBase and causes data loss! HBase think a WAL write 
> was hflushed and make row write successful. In fact the row was never gone to 
> storage.
> Checking the lastError variable at the end of flush op will solve the issue. 
> Then we will throw IOE from this flush() itself.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17323) s3a getFileStatus("/") to skip IO

2020-10-22 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17323:
---

 Summary: s3a getFileStatus("/") to skip IO
 Key: HADOOP-17323
 URL: https://issues.apache.org/jira/browse/HADOOP-17323
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.3.0
Reporter: Steve Loughran


Calls to "hadoop fs ls s3a://something/" add a full getFileStatus() sequence 
(HEAD, LIST, LIST) because FsShell Ls calls getFileStatus on every input. 

We should just build a root status entry immediately. There's one consequence: 
if a user has disabled the s3a existence probe, there will be no checks for the 
bucket existence. But that can be viewed as a "well, that's what happens" 
behaviour



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17318) S3A Magic committer to make cleanup of pending uploads optional

2020-10-20 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17318:
---

 Summary: S3A Magic committer to make cleanup of pending uploads 
optional
 Key: HADOOP-17318
 URL: https://issues.apache.org/jira/browse/HADOOP-17318
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.3.0
Reporter: Steve Loughran
Assignee: Steve Loughran


Reported failure of magic committer block uploads as pending upload ID is 
unknown. Likely cause: it's been aborted by another job

# Make it possible to turn off cleanup of pending uploads in magic committer
# log more about uploads being deleted in committers
# and upload ID in the S3aBlockOutputStream errors



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17313) FileSystem.get to support slow-to-instantiate FS clients

2020-10-19 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17313:
---

 Summary: FileSystem.get to support slow-to-instantiate FS clients
 Key: HADOOP-17313
 URL: https://issues.apache.org/jira/browse/HADOOP-17313
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs, fs/azure, fs/s3
Affects Versions: 3.3.0
Reporter: Steve Loughran



A recurrent problem in processes with many worker threads (hive, spark etc) is 
that calling `FileSystem.get(URI-to-object-store)` triggers the creation and 
then discard of many FS clients -all but one for the same URL. As well as the 
direct performance hit, this can exacerbate locking problems and make 
instantiation a lot slower than it would otherwise be.

This has been observed with the S3A and ABFS connectors.

The ultimate solution here would probably be something more complicated to 
ensure that only one thread was ever creating a connector for a given URL -the 
rest would wait for it to be initialized. This would (a) reduce contention & 
CPU, IO network load, and (b) reduce the time for all but the first thread to 
resume processing to that of the remaining time in .initialize(). This would 
also benefit the S3A connector.

We'd need something like

# A (per-user) map of filesystems being created 
# split createFileSystem into two: instantiateFileSystem and 
initializeFileSystem
# each thread to instantiate the FS, put() it into the new map
# If there was one already, discard the old one and wait for the new one to be 
ready via a call to Object.wait()
# If there wasn't an entry, call initializeFileSystem) and then, finally, call 
Object.notifyAll(), and move it from the map of filesystems being initialized 
to the map of created filesystems

This sounds too straightforward to be that simple; the troublespots are 
probably related to race conditions moving entries between the two maps and 
making sure that no thread will block on the FS being initialized while it has 
already been initialized (and so wait() will block forever).

Rather than seek perfection, it may be safest go for a best-effort optimisation 
of the #of FS instances created/initialized. That is: its better to maybe 
create a few more FS instances than needed than it is to block forever.

Something is doable here, it's just not quick-and-dirty. Testing will be "fun"; 
probably best to isolate this new logic somewhere where we can simulate slow 
starts on one thread with many other threads waiting for it.

A simpler option would be to have a lock on the construction process: only one 
FS can be instantiated per user at a a time.




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17312) S3AInputStream to be resilient to faiures in abort(); translate AWS Exceptions

2020-10-19 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17312:
---

 Summary: S3AInputStream to be resilient to faiures in abort(); 
translate AWS Exceptions
 Key: HADOOP-17312
 URL: https://issues.apache.org/jira/browse/HADOOP-17312
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.2.1, 3.3.0
Reporter: Steve Loughran


Stack overflow issue complaining about ConnectionClosedException during 
S3AInputStream close(), seems triggered by an EOF exception in abort. That is: 
we are trying to close the stream and it is failing because the stream is 
closed. oops.

https://stackoverflow.com/questions/64412010/pyspark-org-apache-http-connectionclosedexception-premature-end-of-content-leng

Looking @ the stack, we aren't translating AWS exceptions in abort() to IOEs, 
which may be a factor.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17261) s3a rename() now requires s3:deleteObjectVersion permission

2020-10-14 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17261.
-
Fix Version/s: 3.3.1
   Resolution: Fixed

> s3a rename() now requires s3:deleteObjectVersion permission
> ---
>
> Key: HADOOP-17261
> URL: https://issues.apache.org/jira/browse/HADOOP-17261
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.4.0
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 2h 10m
>  Remaining Estimate: 0h
>
> With the directory marker change (HADOOP-13230) you need the 
> s3:deleteObjectVersion permission in your role, else the operation will fail 
> in the bulk delete, *if S3Guard is in use*
> Root cause
> -if fileStatus has a versionId, we pass that in to the delete KeyVersion pair
> -an unguarded listing doesn't get that versionId, so this is not an issue
> -but if files in a directory were previously created such that S3Guard has 
> their versionId in its tables, that is used in the request
> -which then fails if the caller doesn't have the permission
> Although we say "you need s3:delete*", this is a regression as any IAM role 
> without the permission will have rename fail during delete



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17293) S3A to always probe S3 in S3A getFileStatus on non-auth paths

2020-10-08 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17293?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17293.
-
Fix Version/s: 3.3.1
   Resolution: Fixed

>  S3A to always probe S3 in S3A getFileStatus on non-auth paths
> --
>
> Key: HADOOP-17293
> URL: https://issues.apache.org/jira/browse/HADOOP-17293
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.3.1
>Reporter: Steve Loughran
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 1h 50m
>  Remaining Estimate: 0h
>
> an incidental part of HDP-13230 was a fix to innerGetFileStatus, wherein 
> after a HEAD request we would update the DDB record, so resetting it's TTL
> Applications which did remote updates of buckets without going through 
> s3guard are now triggering failures in applications in the cluster when they 
> go to open the file



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17300) FileSystem.DirListingIterator.next() call should return NoSuchElementException

2020-10-07 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17300.
-
Fix Version/s: 3.3.1
   Resolution: Fixed

Fixed in HADOOP-17281, because the changes to the contract tests there found 
the bug

> FileSystem.DirListingIterator.next() call should return NoSuchElementException
> --
>
> Key: HADOOP-17300
> URL: https://issues.apache.org/jira/browse/HADOOP-17300
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: common, fs
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
> Fix For: 3.3.1
>
>
> FileSystem.DirListingIterator.next() call should return 
> NoSuchElementException rather than IllegalStateException
>  
> Stacktrace for new test failure:
>  
> {code:java}
> java.lang.IllegalStateException: No more items in 
> iteratorjava.lang.IllegalStateException: No more items in iterator at 
> com.google.common.base.Preconditions.checkState(Preconditions.java:507) at 
> org.apache.hadoop.fs.FileSystem$DirListingIterator.next(FileSystem.java:2232) 
> at 
> org.apache.hadoop.fs.FileSystem$DirListingIterator.next(FileSystem.java:2205) 
> at 
> org.apache.hadoop.fs.contract.ContractTestUtils.iteratorToListThroughNextCallsAlone(ContractTestUtils.java:1495)
>  at 
> org.apache.hadoop.fs.contract.AbstractContractGetFileStatusTest.testListStatusIteratorFile(AbstractContractGetFileStatusTest.java:366)
> {code}
>  
> CC [~ste...@apache.org]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17281) Implement FileSystem.listStatusIterator() in S3AFileSystem

2020-10-07 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17281.
-
Fix Version/s: 3.3.1
   Resolution: Fixed

merged into 3.1+; looking forward to this. The HADOOP-16380 stats code will 
need to be wired up to this; I'm not doing it *yet* as a I don't want to rebase 
everything there

> Implement FileSystem.listStatusIterator() in S3AFileSystem
> --
>
> Key: HADOOP-17281
> URL: https://issues.apache.org/jira/browse/HADOOP-17281
> Project: Hadoop Common
>  Issue Type: Task
>  Components: fs/s3
>Affects Versions: 3.3.0
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 2h 50m
>  Remaining Estimate: 0h
>
> Currently S3AFileSystem only implements listStatus() api which returns an 
> array. Once we implement the listStatusIterator(), clients can benefit from 
> the async listing done recently 
> https://issues.apache.org/jira/browse/HADOOP-17074  by performing some tasks 
> on files while iterating them.
>  
> CC [~stevel]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17125) Using snappy-java in SnappyCodec

2020-10-07 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17125.
-
Fix Version/s: 3.3.1
   Resolution: Fixed

> Using snappy-java in SnappyCodec
> 
>
> Key: HADOOP-17125
> URL: https://issues.apache.org/jira/browse/HADOOP-17125
> Project: Hadoop Common
>  Issue Type: New Feature
>  Components: common
>Affects Versions: 3.3.0
>Reporter: DB Tsai
>Assignee: DB Tsai
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 24h 40m
>  Remaining Estimate: 0h
>
> In Hadoop, we use native libs for snappy codec which has several 
> disadvantages:
>  * It requires native *libhadoop* and *libsnappy* to be installed in system 
> *LD_LIBRARY_PATH*, and they have to be installed separately on each node of 
> the clusters, container images, or local test environments which adds huge 
> complexities from deployment point of view. In some environments, it requires 
> compiling the natives from sources which is non-trivial. Also, this approach 
> is platform dependent; the binary may not work in different platform, so it 
> requires recompilation.
>  * It requires extra configuration of *java.library.path* to load the 
> natives, and it results higher application deployment and maintenance cost 
> for users.
> Projects such as *Spark* and *Parquet* use 
> [snappy-java|[https://github.com/xerial/snappy-java]] which is JNI-based 
> implementation. It contains native binaries for Linux, Mac, and IBM in jar 
> file, and it can automatically load the native binaries into JVM from jar 
> without any setup. If a native implementation can not be found for a 
> platform, it can fallback to pure-java implementation of snappy based on 
> [aircompressor|[https://github.com/airlift/aircompressor/tree/master/src/main/java/io/airlift/compress/snappy]].



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Created] (HADOOP-17293) refreshing S3Guard records after TTL-triggered-HEAD breaks some workflows

2020-10-02 Thread Steve Loughran (Jira)
Steve Loughran created HADOOP-17293:
---

 Summary: refreshing S3Guard records after TTL-triggered-HEAD 
breaks some workflows
 Key: HADOOP-17293
 URL: https://issues.apache.org/jira/browse/HADOOP-17293
 Project: Hadoop Common
  Issue Type: Sub-task
  Components: fs/s3
Affects Versions: 3.3.1
Reporter: Steve Loughran
Assignee: Steve Loughran


an incidental part of HDP-13230 was a fix to innerGetFileStatus, wherein after 
a HEAD request we would update the DDB record, so resetting it's TTL

Applications which did remote updates of buckets without going through s3guard 
are now triggering failures in applications in the cluster when they go to open 
the file



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17267) Add debug-level logs in Filesystem#close

2020-09-29 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17267.
-
Fix Version/s: 3.3.1
   Resolution: Fixed

+1, merged to trunk. Thanks!

> Add debug-level logs in Filesystem#close
> 
>
> Key: HADOOP-17267
> URL: https://issues.apache.org/jira/browse/HADOOP-17267
> Project: Hadoop Common
>  Issue Type: Improvement
>  Components: fs
>Affects Versions: 3.3.0
>Reporter: Karen Coppage
>Assignee: Karen Coppage
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 5h
>  Remaining Estimate: 0h
>
> HDFS reuses the same cached FileSystem object across the file system. If the 
> client calls FileSystem.close(), closeAllForUgi(), or closeAll() (if it 
> applies to the instance) anywhere in the system it purges the cache of that 
> FS instance, and trying to use the instance results in an IOException: 
> FileSystem closed.
> It would be a great help to clients to see where and when a given FS instance 
> was closed. I.e. in close(), closeAllForUgi(), or closeAll(), it would be 
> great to see a DEBUG-level log of
>  * calling method name, class, file name/line number
>  * FileSystem object's identity hash (FileSystem.close() only)
> For the full calling stack, turn on TRACE logging.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-11202) SequenceFile crashes with client-side encrypted files that are shorter than FileSystem.getStatus(path)

2020-09-22 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-11202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-11202.
-
Resolution: Won't Fix

> SequenceFile crashes with client-side encrypted files that are shorter than 
> FileSystem.getStatus(path)
> --
>
> Key: HADOOP-11202
> URL: https://issues.apache.org/jira/browse/HADOOP-11202
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 2.2.0
> Environment: Amazon EMR 3.0.4
>Reporter: Corby Wilson
>Priority: Major
>
> Encrypted files are often padded to allow for proper encryption on a 2^n-bit 
> boundary.  As a result, the encrypted file might be a few bytes bigger than 
> the unencrypted file.
> We have a case where an encrypted files is 2 bytes bigger due to padding.
> When we run a HIVE job on the file to get a record count (select count(*) 
> from ) it runs org.apache.hadoop.mapred.SequenceFileRecordReader and 
> loads the file in through a custom FS InputStream.
> The InputStream decrypts the file  as it gets read in.  Splits are properly 
> handled as it extends both Seekable and Positioned Readable.
> When the org.apache.hadoop.io.SequenceFile class intializes it reads in the 
> file size from the FileMetadata which returns the file size of the encrypted 
> file on disk (or in this case in S3).
> However, the actual file size is 2 bytes less, so the InputStream will return 
> EOF (-1) before the SequenceFile thinks it's done.
> As a result, the SequenceFile$Reader tried to run the next->readRecordLength 
> after the file has been closed and we get a crash.
> The SequenceFile class SHOULD, instead, pay attention to the EOF marker from 
> the stream instead of the file size reported in the metadata and set the 
> 'more' flag accordingly.
> Sample stack dump from crash
> 2014-10-10 21:25:27,160 WARN [main] org.apache.hadoop.mapred.YarnChild: 
> Exception running child : java.io.IOException: java.io.IOException: 
> java.io.EOFException
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:304)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.next(HadoopShimsSecure.java:220)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.moveToNext(MapTask.java:199)
>   at 
> org.apache.hadoop.mapred.MapTask$TrackedRecordReader.next(MapTask.java:185)
>   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:52)
>   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:433)
>   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:344)
>   at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:162)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at javax.security.auth.Subject.doAs(Subject.java:415)
>   at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1491)
>   at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:157)
> Caused by: java.io.IOException: java.io.EOFException
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerChain.handleRecordReaderNextException(HiveIOExceptionHandlerChain.java:121)
>   at 
> org.apache.hadoop.hive.io.HiveIOExceptionHandlerUtil.handleRecordReaderNextException(HiveIOExceptionHandlerUtil.java:77)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.doNext(HiveContextAwareRecordReader.java:276)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:101)
>   at 
> org.apache.hadoop.hive.ql.io.CombineHiveRecordReader.doNext(CombineHiveRecordReader.java:41)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwareRecordReader.next(HiveContextAwareRecordReader.java:108)
>   at 
> org.apache.hadoop.hive.shims.HadoopShimsSecure$CombineFileRecordReader.doNextWithExceptionHandler(HadoopShimsSecure.java:302)
>   ... 11 more
> Caused by: java.io.EOFException
>   at java.io.DataInputStream.readInt(DataInputStream.java:392)
>   at 
> org.apache.hadoop.io.SequenceFile$Reader.readRecordLength(SequenceFile.java:2332)
>   at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2363)
>   at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2500)
>   at 
> org.apache.hadoop.mapred.SequenceFileRecordReader.next(SequenceFileRecordReader.java:82)
>   at 
> org.apache.hadoop.hive.ql.io.HiveContextAwa

[jira] [Resolved] (HADOOP-17023) Tune listStatus() api of s3a.

2020-09-21 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17023?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17023.
-
Fix Version/s: 3.3.1
   Resolution: Fixed


Available in 3.3.1+. 

> Tune listStatus() api of s3a.
> -
>
> Key: HADOOP-17023
> URL: https://issues.apache.org/jira/browse/HADOOP-17023
> Project: Hadoop Common
>  Issue Type: Sub-task
>  Components: fs/s3
>Affects Versions: 3.2.1
>Reporter: Mukund Thakur
>Assignee: Mukund Thakur
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 3h
>  Remaining Estimate: 0h
>
> Similar optimisation which was done for listLocatedSttaus() 
> https://issues.apache.org/jira/browse/HADOOP-16465  can done for listStatus() 
> api as well. 
> This is going to reduce the number of remote calls in case of directory 
> listing.
>  
> CC [~ste...@apache.org] [~shwethags]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-17216) hadoop-aws having FileNotFoundException when accessing AWS s3 occassionally

2020-09-21 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-17216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-17216.
-
Fix Version/s: 3.3.0
   Resolution: Duplicate

> hadoop-aws having FileNotFoundException when accessing AWS s3 occassionally
> ---
>
> Key: HADOOP-17216
> URL: https://issues.apache.org/jira/browse/HADOOP-17216
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: fs/s3
>Affects Versions: 3.1.2
> Environment: hadoop = "3.1.2"
> hadoop-aws = "3.1.2"
> spark = "2.4.5"
> spark-on-k8s-operator = "v1beta2-1.1.2-2.4.5"
> deployed into AWS EKS kubernates. Version information below:
> Server Version: version.Info\{Major:"1", Minor:"16+", 
> GitVersion:"v1.16.8-eks-e16311", 
> GitCommit:"e163110a04dcb2f39c3325af96d019b4925419eb", GitTreeState:"clean", 
> BuildDate:"2020-03-27T22:37:12Z", GoVersion:"go1.13.8", Compiler:"gc", 
> Platform:"linux/amd64"}
>Reporter: Cheng Wei
>Priority: Major
> Fix For: 3.3.0
>
>
> Hi,
> When using spark streaming with deltalake, I got the following exception 
> occasionally, something like 1 out of 100. Thanks.
> {code:java}
> Caused by: java.io.FileNotFoundException: No such file or directory: 
> s3a://[pathToFolder]/date=2020-07-29/part-5-046af631-7198-422c-8cc8-8d3adfb4413e.c000.snappy.parquet
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.s3GetFileStatus(S3AFileSystem.java:2255)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.innerGetFileStatus(S3AFileSystem.java:2149)
>  at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.getFileStatus(S3AFileSystem.java:2088)
>  at 
> org.apache.spark.sql.delta.files.DelayedCommitProtocol$$anonfun$8.apply(DelayedCommitProtocol.scala:141)
>  at 
> org.apache.spark.sql.delta.files.DelayedCommitProtocol$$anonfun$8.apply(DelayedCommitProtocol.scala:139)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
>  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
>  at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
>  at scala.collection.AbstractTraversable.map(Traversable.scala:104)
>  at 
> org.apache.spark.sql.delta.files.DelayedCommitProtocol.commitTask(DelayedCommitProtocol.scala:139)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatDataWriter.commit(FileFormatDataWriter.scala:78)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:247)
>  at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:242){code}
>  
> -Environment
> hadoop = "3.1.2"
>  hadoop-aws = "3.1.2"
> spark = "2.4.5"
> spark-on-k8s-operator = "v1beta2-1.1.2-2.4.5"
>  
> deployed into AWS EKS kubernates. Version information below:
> Server Version: version.Info\{Major:"1", Minor:"16+", 
> GitVersion:"v1.16.8-eks-e16311", 
> GitCommit:"e163110a04dcb2f39c3325af96d019b4925419eb", GitTreeState:"clean", 
> BuildDate:"2020-03-27T22:37:12Z", GoVersion:"go1.13.8", Compiler:"gc", 
> Platform:"linux/amd64"}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



[jira] [Resolved] (HADOOP-15136) Typo in rename spec pseudocode

2020-09-18 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/HADOOP-15136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved HADOOP-15136.
-
Fix Version/s: 3.3.1
   Resolution: Fixed

Fixed by david tucker. 

> Typo in rename spec pseudocode
> --
>
> Key: HADOOP-15136
> URL: https://issues.apache.org/jira/browse/HADOOP-15136
> Project: Hadoop Common
>  Issue Type: Bug
>  Components: documentation
>Reporter: Rae Marks
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.3.1
>
>  Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> Location of issue: [rename spec 
> documentation|https://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-common/filesystem/filesystem.html#boolean_renamePath_src_Path_d]
> The text description for what rename does in the case when the destination 
> exists and is a directory is correct. However, the pseudocode is not.
> What is written:
> {code}
> let dest = if (isDir(FS, src) and d != src) :
> d + [filename(src)]
> else :
> d
> {code}
> What I expected:
> {{isDir(FS, src)}} should be {{isDir(FS, d)}}.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org



<    5   6   7   8   9   10   11   12   13   14   >