Re: Object Store Integration Tests in Pre-submit

Steve Loughran Thu, 24 Jul 2025 04:48:45 -0700

Didn't know about the ASF credentials. We'd want them to be used somewhere
to generate those session credentials, with those credentials the only
secrets that a test run would have.

I"d thought of somehow generating restricted session credentials to the
target bucket only, and with a duration of 60 minutes -loss of credentials
would only have marginal effect, primarily one of cost rather than
privilege.

>
>
> One nice aspect of GitHub Actions is that they can also be run on
> individual forks. Contributors can configure their own AWS credentials
> as secrets in their forks of the Hadoop repo and run the tests there.
> This would help avoid consuming ASF resources directly. If ASF
> credentials aren’t available, a link to the successful run on their
> fork can also be included as a comment on the PR to confirm the test
> results.
>
>
+1

> This was just an early idea I had back then—feel free to explore it
> further if it seems useful.
>
> -Ayush
>
> [1] https://issues.apache.org/jira/browse/INFRA-24353
>
> On Thu, 24 Jul 2025 at 04:30, Chris Nauroth <cnaur...@apache.org> wrote:
> >
> > Hello everyone,
> >
> > For years, we've relied on specific contributors to run and verify the
> > integration tests for object store integrations like S3A, because the
> tests
> > require credentials for specific cloud providers. I'd like to explore if
> we
> > have any path forward today to bringing those tests into the pre-submit
> > automation. If successful, I'd like to apply that strategy to the GCS
> > integration tests, which are part of HADOOP-19343.
>

thinking about this -do you think this stuff should be merged in and
stabilize in place?
You've all been working on it for a while

>
> > To make this work, we'd need to either 1) run tests in a VM hosted in the
> > cloud provider, where credentials are vended natively from an adjacent
> > metadata server, or

Impala does this

> 2) export credentials so that the tests can run in any
> > VM outside the cloud provider (and be really, really, really careful to
> > secure the access to those exported credentials).
>

If I could wire up my own credentials to github credentials/actions, I'd
locally generated a 12 hour session triple and upload them to github
secrets for my own actions only. I'd need to somehow set up the test run so
that

   1. the binding info is picked up (i.e. auth-keys.xml) created in the
   right place -or that is modified to fall back to env vars (it probably
   already does this for aws credentials, so it's only the target bucket to be
   picked up, e.g. HADOOP_AWS_TARGET_BUCKET. Easily done.
   2. maven test runs exclude the root bucket tests and instead pick up a
   run ID to use as the base path for tests. The build is set up for ths.

Running tests with an env var test target rather than an auth-keys file
could be done with something in core-site.xml which would set the test
target to that of an env var; auth-keys.xml would override it.

 <!-- why do we have two? -->
  <property>
    <name>test.fs.s3a.name</name>
    <value>${env.HADOOP_AWS_BUCKET:-s3a://none}</value>
  </property>

  <property>
    <name>fs.contract.test.fs.s3a</name>
    <value>${test.fs.s3a.name}</value>
  </property>

  <include xmlns="http://www.w3.org/2001/XInclude"; href="auth-keys.xml">
    <fallback/>
  </include>

we'd need some special handling in test setup/S3AContract to recognise that
"s3a://none" is a special marker to indicate there is no target FS. Again,
easily done.

Summary of thoughts:

   1. we put env var binding into core-site.xml with S3AContract support
   2. github action can run the itests without root bucket tests enabled
   (if they ever want to test PRs in parallel)
   3. people can upload their own (session) credentials with very
   restricted roles
   4. document this
   5. let someone bold try it out

There's also flaky tests. My Junit5 ITest PR adds a @FlakyTest tag which
could be used to turn off those which are a bit brittle -but should only be
used if the behavior is unfixable (network buffer overruns in
(AbstractContractUnbufferTest) is the only legit use I can see

>
> > Has anyone else already explored this recently? If not, I was thinking of
> > filing an INFRA ticket to discuss if they already have established
> patterns
> > for this. This is potentially relevant to other projects. (It was the
> code
> > review for FLINK-37247 that prompted me to start this conversation.) I
> > think it makes sense to solve it in Hadoop first and then extend it to
> > other projects.
> >
>
> Spark and Iceberg use docker and Minio. Good: you only need docker. Bad:
it's still some variant of a mock test, as passing it says very little
about things working with the real stores. I wouldn't trust a PR to go in
with only that.

Anyway, I like everyone to test in their own setup as that helps find cases
where the connector is brittle to different deployment setups. The more
diverse test environments are, the more issues get found and fixed before
we ship

Re: Object Store Integration Tests in Pre-submit

Reply via email to