[ https://issues.apache.org/jira/browse/HADOOP-19057?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Steve Loughran updated HADOOP-19057: ------------------------------------ Description: The s3 test bucket used in hadoop-aws tests of S3 select and large file reads is no longer publicly accessible {code} java.nio.file.AccessDeniedException: landsat-pds: getBucketMetadata() on landsat-pds: software.amazon.awssdk.services.s3.model.S3Exception: null (Service: S3, Status Code: 403, Request ID: 06QNYQ9GND5STQ2S, Extended Request ID: O+u2Y1MrCQuuSYGKRAWHj/5LcDLuaFS8owNuXXWSJ0zFXYfuCaTVLEP351S/umti558eKlUqV6U=):null {code} * Because HADOOP-18830 has cut s3 select, all we need in 3.4.1+ is a large file for some reading tests * changing the default value disables s3 select tests on older releases * if fs.s3a.scale.test.csvfile is set to " " then other tests which need it will be skipped Proposed * we locate a new large file under the (requester pays) s3a://usgs-landsat/ bucket . All releases with HADOOP-18168 can use this * update 3.4.1 source to use this; document it * do something similar for 3.3.9 + maybe even cut s3 select there too. * document how to use it on older releases with requester-pays support * document how to completely disable it on older releases. h2. How to fix (most) landsat test failures on older releases add this to your auth-keys.xml file. Expect some failures in a few tests with-hardcoded references to the bucket (assumed role delegation tokens) {code} <property> <name>fs.s3a.scale.test.csvfile</name> <value>s3a://noaa-cors-pds/raw/2023/017/ohfh/OHFH017d.23_.gz</value> <description>file used in scale tests</description> </property> <property> <name>fs.s3a.bucket.noaa-cors-pds.endpoint.region</name> <value>us-east-1</value> </property> <property> <name>fs.s3a.bucket.noaa-isd-pds.multipart.purge</name> <value>false</value> <description>Don't try to purge uploads in the read-only bucket, as it will only create log noise.</description> </property> <property> <name>fs.s3a.bucket.noaa-isd-pds.probe</name> <value>0</value> <description>Let's postpone existence checks to the first IO operation </description> </property> <property> <name>fs.s3a.bucket.noaa-isd-pds.audit.add.referrer.header</name> <value>false</value> <description>Do not add the referrer header</description> </property> <property> <name>fs.s3a.bucket.noaa-isd-pds.prefetch.block.size</name> <value>128k</value> <description>Use a small prefetch size so tests fetch multiple blocks</description> </property> <property> <name>fs.s3a.select.enabled</name> <value>false</value> </property> {code} Some delegation token tests will still fail; these have hard-coded references to the old bucket. *Do not worry about these* {code} [ERROR] ITestDelegatedMRJob.testJobSubmissionCollectsTokens[0] » AccessDenied s3a://la... [ERROR] ITestDelegatedMRJob.testJobSubmissionCollectsTokens[1] » AccessDenied s3a://la... [ERROR] ITestDelegatedMRJob.testJobSubmissionCollectsTokens[2] » AccessDenied s3a://la... [ERROR] ITestRoleDelegationInFilesystem>ITestSessionDelegationInFilesystem.testDelegatedFileSystem:347->ITestSessionDelegationInFilesystem.readLandsatMetadata:614 » AccessDenied [ERROR] ITestSessionDelegationInFilesystem.testDelegatedFileSystem:347->readLandsatMetadata:614 » AccessDenied {code} was: The s3 test bucket used in hadoop-aws tests of S3 select and large file reads is no longer publicly accessible {code} java.nio.file.AccessDeniedException: landsat-pds: getBucketMetadata() on landsat-pds: software.amazon.awssdk.services.s3.model.S3Exception: null (Service: S3, Status Code: 403, Request ID: 06QNYQ9GND5STQ2S, Extended Request ID: O+u2Y1MrCQuuSYGKRAWHj/5LcDLuaFS8owNuXXWSJ0zFXYfuCaTVLEP351S/umti558eKlUqV6U=):null {code} * Because HADOOP-18830 has cut s3 select, all we need in 3.4.1+ is a large file for some reading tests * changing the default value disables s3 select tests on older releases * if fs.s3a.scale.test.csvfile is set to " " then other tests which need it will be skipped Proposed * we locate a new large file under the (requester pays) s3a://usgs-landsat/ bucket . All releases with HADOOP-18168 can use this * update 3.4.1 source to use this; document it * do something similar for 3.3.9 + maybe even cut s3 select there too. * document how to use it on older releases with requester-pays support * document how to completely disable it on older releases. h2. How to fix (most) landsat test failures on older releases add this to your auth-keys.xml file. Expect some failures in a few tests with-hardcoded references to the bucket (assumed role delegation tokens) {code} <property> <name>fs.s3a.scale.test.csvfile</name> <value>s3a://noaa-cors-pds/raw/2023/017/ohfh/OHFH017d.23_.gz</value> <description>file used in scale tests</description> </property> <property> <name>fs.s3a.bucket.noaa-cors-pds.endpoint.region</name> <value>us-east-1</value> </property> <property> <name>fs.s3a.bucket.noaa-isd-pds.multipart.purge</name> <value>false</value> <description>Don't try to purge uploads in the read-only bucket, as it will only create log noise.</description> </property> <property> <name>fs.s3a.bucket.noaa-isd-pds.probe</name> <value>0</value> <description>Let's postpone existence checks to the first IO operation </description> </property> <property> <name>fs.s3a.bucket.noaa-isd-pds.audit.add.referrer.header</name> <value>false</value> <description>Do not add the referrer header</description> </property> <property> <name>fs.s3a.bucket.noaa-isd-pds.prefetch.block.size</name> <value>128k</value> <description>Use a small prefetch size so tests fetch multiple blocks</description> </property> <property> <name>fs.s3a.select.enabled</name> <value>false</value> </property> {code} > S3 public test bucket landsat-pds unreadable -needs replacement > --------------------------------------------------------------- > > Key: HADOOP-19057 > URL: https://issues.apache.org/jira/browse/HADOOP-19057 > Project: Hadoop Common > Issue Type: Sub-task > Components: fs/s3, test > Affects Versions: 3.4.0, 3.2.4, 3.3.9, 3.3.6, 3.5.0 > Reporter: Steve Loughran > Assignee: Steve Loughran > Priority: Critical > Labels: pull-request-available > Fix For: 3.5.0 > > > The s3 test bucket used in hadoop-aws tests of S3 select and large file reads > is no longer publicly accessible > {code} > java.nio.file.AccessDeniedException: landsat-pds: getBucketMetadata() on > landsat-pds: software.amazon.awssdk.services.s3.model.S3Exception: null > (Service: S3, Status Code: 403, Request ID: 06QNYQ9GND5STQ2S, Extended > Request ID: > O+u2Y1MrCQuuSYGKRAWHj/5LcDLuaFS8owNuXXWSJ0zFXYfuCaTVLEP351S/umti558eKlUqV6U=):null > {code} > * Because HADOOP-18830 has cut s3 select, all we need in 3.4.1+ is a large > file for some reading tests > * changing the default value disables s3 select tests on older releases > * if fs.s3a.scale.test.csvfile is set to " " then other tests which need it > will be skipped > Proposed > * we locate a new large file under the (requester pays) s3a://usgs-landsat/ > bucket . All releases with HADOOP-18168 can use this > * update 3.4.1 source to use this; document it > * do something similar for 3.3.9 + maybe even cut s3 select there too. > * document how to use it on older releases with requester-pays support > * document how to completely disable it on older releases. > h2. How to fix (most) landsat test failures on older releases > add this to your auth-keys.xml file. Expect some failures in a few tests > with-hardcoded references to the bucket (assumed role delegation tokens) > {code} > <property> > <name>fs.s3a.scale.test.csvfile</name> > <value>s3a://noaa-cors-pds/raw/2023/017/ohfh/OHFH017d.23_.gz</value> > <description>file used in scale tests</description> > </property> > <property> > <name>fs.s3a.bucket.noaa-cors-pds.endpoint.region</name> > <value>us-east-1</value> > </property> > <property> > <name>fs.s3a.bucket.noaa-isd-pds.multipart.purge</name> > <value>false</value> > <description>Don't try to purge uploads in the read-only bucket, as > it will only create log noise.</description> > </property> > <property> > <name>fs.s3a.bucket.noaa-isd-pds.probe</name> > <value>0</value> > <description>Let's postpone existence checks to the first IO operation > </description> > </property> > <property> > <name>fs.s3a.bucket.noaa-isd-pds.audit.add.referrer.header</name> > <value>false</value> > <description>Do not add the referrer header</description> > </property> > <property> > <name>fs.s3a.bucket.noaa-isd-pds.prefetch.block.size</name> > <value>128k</value> > <description>Use a small prefetch size so tests fetch multiple > blocks</description> > </property> > <property> > <name>fs.s3a.select.enabled</name> > <value>false</value> > </property> > {code} > Some delegation token tests will still fail; these have hard-coded references > to the old bucket. *Do not worry about these* > {code} > [ERROR] ITestDelegatedMRJob.testJobSubmissionCollectsTokens[0] » > AccessDenied s3a://la... > [ERROR] ITestDelegatedMRJob.testJobSubmissionCollectsTokens[1] » > AccessDenied s3a://la... > [ERROR] ITestDelegatedMRJob.testJobSubmissionCollectsTokens[2] » > AccessDenied s3a://la... > [ERROR] > ITestRoleDelegationInFilesystem>ITestSessionDelegationInFilesystem.testDelegatedFileSystem:347->ITestSessionDelegationInFilesystem.readLandsatMetadata:614 > » AccessDenied > [ERROR] > ITestSessionDelegationInFilesystem.testDelegatedFileSystem:347->readLandsatMetadata:614 > » AccessDenied > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org