steveloughran opened a new pull request, #6308:
URL: https://github.com/apache/hadoop/pull/6308

   
   Add support for S3Express storage in S3A connector production and test code.
   
   Contains HADOOP-18955. AWS SDK v2: add path capability probe 
"fs.s3a.capability.aws.v2
   * lets us probe for AWS SDK version
   * bucket-info reports it
   
   Contains HADOOP-18961 S3A: add s3guard command "bucket"
   
   hadoop s3guard bucket -create -region us-west-2 -zone usw2-az2 \
     s3a://stevel--usw2-az2--x-s3/
   
   * requires -zone if bucket is zonal
   * rejects it if not
   * rejects zonal bucket suffixes if endpoint is not aws (safety feature)
   * imperfect, but a functional starting point.
   
   New path capability "fs.s3a.capability.zonal.storage"
   * Used in tests to determine whether pending uploads manifest paths
   * cli tests can probe for this
   * bucket-info reports it
   * some tests disable/change assertions as appropriate
   
   ----
   
   Shell commands fail on S3Express buckets if pending uploads.
   
   New path capability in hadoop-common
      "fs.capability.directory.listing.inconsistent"
   
   1. S3AFS returns true on a S3 Express bucket
   2. FileUtil.maybeIgnoreMissingDirectory(fs, path, fnfe) decides whether to 
swallow the exception or not.
   3. This is used in: Shell, FileInputFormat, LocatedFileStatusFetcher
   
   Fixes with tests
   * fs -ls -R
   * fs -du
   * fs -df
   * fs -find
   * S3AFS.getContentSummary() (maybe...should discuss)
   * mapred LocatedFileStatusFetcher
   * Globber, HADOOP-15478 already fixed that when dealing with S3 
inconsistencies
   * FileInputFormat
   
   S3Express CreateSession request is permitted outside audit spans.
   
   S3 Bulk Delete calls request the store to return the list of deleted objects 
if RequestFactoryImpl is set to trace.
   log4j.logger.org.apache.hadoop.fs.s3a.impl.RequestFactoryImpl=TRACE
   
   Test Changes
    * ITestS3AMiscOperations removes all tests which require unencrypted 
buckets. AWS S3 defaults to SSE-S3 everywhere.
    * ITestBucketTool to test new tool without actually creating new buckets.
    * S3ATestUtils add methods to skip test suites/cases if store is/is not 
S3Express
    * Cutting down on "is this a S3Express bucket" logic to trailing --x-s3 
string and not worrying about AZ naming logic. commented out relevant tests.
    * ITestTreewalkProblems validated against standard and S3Express stores
   
   Outstanding
   
    * Distcp: tests show it fails. Proposed: release notes.
   
   ---
   
   x-amz-checksum header not found when signing S3Express messages
   
   This modifies the custom signer in ITestCustomSigner to be a subclass of 
AwsS3V4Signer with a goal of preventing signing problems with S3 Express stores.
   
   ----
   
   RemoteFileChanged renaming multipart file
   
   Maps 412 status code to RemoteFileChangedException
   
   Modifies huge file tests
   -Adds a check on etag match for stat vs list
   -ITestS3AHugeFilesByteBufferBlocks renames parent dirs, rather than
    files, to replicate distcp better.
   
   ----
   
   S3Express custom Signing cannot handle bulk delete
   
   Copy custom signer into production JAR, so enable downstream testing
   
   Extend ITestCustomSigner to cover more filesystem operations
   - PUT
   - POST
   - COPY
   - LIST
   - Bulk delete through delete() and rename()
   - list + abort multipart uploads
   
   Suite is parameterized on bulk delete enabled/disabled.
   
   To use the new signer for a full test run:
   
   <property>
     <name>fs.s3a.custom.signers</name>
     
<value>CustomSdkSigner:org.apache.hadoop.fs.s3a.auth.CustomSdkSigner</value>
   </property>
   
   <property>
     <name>fs.s3a.s3.signing-algorithm</name>
     <value>CustomSdkSigner</value>
   </property>
   
   
   ### How was this patch tested?
   
   let's just say it took a while
   
   ### For code changes:
   
   - [X] Does the title or this PR starts with the corresponding JIRA issue id 
(e.g. 'HADOOP-17799. Your PR title ...')?
   - [X] Object storage: have the integration tests been executed and the 
endpoint declared according to the connector-specific documentation?
   - [ ] If adding new dependencies to the code, are these dependencies 
licensed in a way that is compatible for inclusion under [ASF 
2.0](http://www.apache.org/legal/resolved.html#category-a)?
   - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, 
`NOTICE-binary` files?
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-issues-h...@hadoop.apache.org

Reply via email to