Great, thanks everyone! I went ahead and filed an infra ticket to ask for
buckets/credentials:

https://issues.apache.org/jira/browse/INFRA-24353

I'll keep you posted on progress.

Steve, yes, I'm planning to start a HADOOP-19343 merge discuss/vote soon.

Chris Nauroth


On Thu, Jul 24, 2025 at 4:47 AM Steve Loughran <ste...@cloudera.com.invalid>
wrote:

> Didn't know about the ASF credentials. We'd want them to be used somewhere
> to generate those session credentials, with those credentials the only
> secrets that a test run would have.
>
> I"d thought of somehow generating restricted session credentials to the
> target bucket only, and with a duration of 60 minutes -loss of credentials
> would only have marginal effect, primarily one of cost rather than
> privilege.
>
>
> >
> >
> > One nice aspect of GitHub Actions is that they can also be run on
> > individual forks. Contributors can configure their own AWS credentials
> > as secrets in their forks of the Hadoop repo and run the tests there.
> > This would help avoid consuming ASF resources directly. If ASF
> > credentials aren’t available, a link to the successful run on their
> > fork can also be included as a comment on the PR to confirm the test
> > results.
> >
> >
> +1
>
>
> > This was just an early idea I had back then—feel free to explore it
> > further if it seems useful.
> >
> > -Ayush
> >
> > [1] https://issues.apache.org/jira/browse/INFRA-24353
> >
> > On Thu, 24 Jul 2025 at 04:30, Chris Nauroth <cnaur...@apache.org> wrote:
> > >
> > > Hello everyone,
> > >
> > > For years, we've relied on specific contributors to run and verify the
> > > integration tests for object store integrations like S3A, because the
> > tests
> > > require credentials for specific cloud providers. I'd like to explore
> if
> > we
> > > have any path forward today to bringing those tests into the pre-submit
> > > automation. If successful, I'd like to apply that strategy to the GCS
> > > integration tests, which are part of HADOOP-19343.
> >
>
> thinking about this -do you think this stuff should be merged in and
> stabilize in place?
> You've all been working on it for a while
>
> >
> > > To make this work, we'd need to either 1) run tests in a VM hosted in
> the
> > > cloud provider, where credentials are vended natively from an adjacent
> > > metadata server, or
>
>
> Impala does this
>
>
> > 2) export credentials so that the tests can run in any
> > > VM outside the cloud provider (and be really, really, really careful to
> > > secure the access to those exported credentials).
> >
>
> If I could wire up my own credentials to github credentials/actions, I'd
> locally generated a 12 hour session triple and upload them to github
> secrets for my own actions only. I'd need to somehow set up the test run so
> that
>
>    1. the binding info is picked up (i.e. auth-keys.xml) created in the
>    right place -or that is modified to fall back to env vars (it probably
>    already does this for aws credentials, so it's only the target bucket
> to be
>    picked up, e.g. HADOOP_AWS_TARGET_BUCKET. Easily done.
>    2. maven test runs exclude the root bucket tests and instead pick up a
>    run ID to use as the base path for tests. The build is set up for ths.
>
>
> Running tests with an env var test target rather than an auth-keys file
> could be done with something in core-site.xml which would set the test
> target to that of an env var; auth-keys.xml would override it.
>
>  <!-- why do we have two? -->
>   <property>
>     <name>test.fs.s3a.name</name>
>     <value>${env.HADOOP_AWS_BUCKET:-s3a://none}</value>
>   </property>
>
>   <property>
>     <name>fs.contract.test.fs.s3a</name>
>     <value>${test.fs.s3a.name}</value>
>   </property>
>
>   <include xmlns="http://www.w3.org/2001/XInclude"; href="auth-keys.xml">
>     <fallback/>
>   </include>
>
> we'd need some special handling in test setup/S3AContract to recognise that
> "s3a://none" is a special marker to indicate there is no target FS. Again,
> easily done.
>
> Summary of thoughts:
>
>    1. we put env var binding into core-site.xml with S3AContract support
>    2. github action can run the itests without root bucket tests enabled
>    (if they ever want to test PRs in parallel)
>    3. people can upload their own (session) credentials with very
>    restricted roles
>    4. document this
>    5. let someone bold try it out
>
>
> There's also flaky tests. My Junit5 ITest PR adds a @FlakyTest tag which
> could be used to turn off those which are a bit brittle -but should only be
> used if the behavior is unfixable (network buffer overruns in
> (AbstractContractUnbufferTest) is the only legit use I can see
>
> >
> > > Has anyone else already explored this recently? If not, I was thinking
> of
> > > filing an INFRA ticket to discuss if they already have established
> > patterns
> > > for this. This is potentially relevant to other projects. (It was the
> > code
> > > review for FLINK-37247 that prompted me to start this conversation.) I
> > > think it makes sense to solve it in Hadoop first and then extend it to
> > > other projects.
> > >
> >
> > Spark and Iceberg use docker and Minio. Good: you only need docker. Bad:
> it's still some variant of a mock test, as passing it says very little
> about things working with the real stores. I wouldn't trust a PR to go in
> with only that.
>
> Anyway, I like everyone to test in their own setup as that helps find cases
> where the connector is brittle to different deployment setups. The more
> diverse test environments are, the more issues get found and fixed before
> we ship
>

Reply via email to