Re: Object Store Integration Tests in Pre-submit

Brahma Reddy Battula Sat, 26 Jul 2025 06:36:43 -0700

Hey Chris,

Hope you mean "https://issues.apache.org/jira/browse/INFRA-27071";



Regards,
Brahma

On Fri, Jul 25, 2025 at 4:38 AM Chris Nauroth <cnaur...@apache.org> wrote:
>
> Great, thanks everyone! I went ahead and filed an infra ticket to ask for
> buckets/credentials:
>
> https://issues.apache.org/jira/browse/INFRA-24353
>
> I'll keep you posted on progress.
>
> Steve, yes, I'm planning to start a HADOOP-19343 merge discuss/vote soon.
>
> Chris Nauroth
>
>
> On Thu, Jul 24, 2025 at 4:47 AM Steve Loughran <ste...@cloudera.com.invalid>
> wrote:
>
> > Didn't know about the ASF credentials. We'd want them to be used somewhere
> > to generate those session credentials, with those credentials the only
> > secrets that a test run would have.
> >
> > I"d thought of somehow generating restricted session credentials to the
> > target bucket only, and with a duration of 60 minutes -loss of credentials
> > would only have marginal effect, primarily one of cost rather than
> > privilege.
> >
> >
> > >
> > >
> > > One nice aspect of GitHub Actions is that they can also be run on
> > > individual forks. Contributors can configure their own AWS credentials
> > > as secrets in their forks of the Hadoop repo and run the tests there.
> > > This would help avoid consuming ASF resources directly. If ASF
> > > credentials aren’t available, a link to the successful run on their
> > > fork can also be included as a comment on the PR to confirm the test
> > > results.
> > >
> > >
> > +1
> >
> >
> > > This was just an early idea I had back then—feel free to explore it
> > > further if it seems useful.
> > >
> > > -Ayush
> > >
> > > [1] https://issues.apache.org/jira/browse/INFRA-24353
> > >
> > > On Thu, 24 Jul 2025 at 04:30, Chris Nauroth <cnaur...@apache.org> wrote:
> > > >
> > > > Hello everyone,
> > > >
> > > > For years, we've relied on specific contributors to run and verify the
> > > > integration tests for object store integrations like S3A, because the
> > > tests
> > > > require credentials for specific cloud providers. I'd like to explore
> > if
> > > we
> > > > have any path forward today to bringing those tests into the pre-submit
> > > > automation. If successful, I'd like to apply that strategy to the GCS
> > > > integration tests, which are part of HADOOP-19343.
> > >
> >
> > thinking about this -do you think this stuff should be merged in and
> > stabilize in place?
> > You've all been working on it for a while
> >
> > >
> > > > To make this work, we'd need to either 1) run tests in a VM hosted in
> > the
> > > > cloud provider, where credentials are vended natively from an adjacent
> > > > metadata server, or
> >
> >
> > Impala does this
> >
> >
> > > 2) export credentials so that the tests can run in any
> > > > VM outside the cloud provider (and be really, really, really careful to
> > > > secure the access to those exported credentials).
> > >
> >
> > If I could wire up my own credentials to github credentials/actions, I'd
> > locally generated a 12 hour session triple and upload them to github
> > secrets for my own actions only. I'd need to somehow set up the test run so
> > that
> >
> >    1. the binding info is picked up (i.e. auth-keys.xml) created in the
> >    right place -or that is modified to fall back to env vars (it probably
> >    already does this for aws credentials, so it's only the target bucket
> > to be
> >    picked up, e.g. HADOOP_AWS_TARGET_BUCKET. Easily done.
> >    2. maven test runs exclude the root bucket tests and instead pick up a
> >    run ID to use as the base path for tests. The build is set up for ths.
> >
> >
> > Running tests with an env var test target rather than an auth-keys file
> > could be done with something in core-site.xml which would set the test
> > target to that of an env var; auth-keys.xml would override it.
> >
> >  <!-- why do we have two? -->
> >   <property>
> >     <name>test.fs.s3a.name</name>
> >     <value>${env.HADOOP_AWS_BUCKET:-s3a://none}</value>
> >   </property>
> >
> >   <property>
> >     <name>fs.contract.test.fs.s3a</name>
> >     <value>${test.fs.s3a.name}</value>
> >   </property>
> >
> >   <include xmlns="http://www.w3.org/2001/XInclude"; href="auth-keys.xml">
> >     <fallback/>
> >   </include>
> >
> > we'd need some special handling in test setup/S3AContract to recognise that
> > "s3a://none" is a special marker to indicate there is no target FS. Again,
> > easily done.
> >
> > Summary of thoughts:
> >
> >    1. we put env var binding into core-site.xml with S3AContract support
> >    2. github action can run the itests without root bucket tests enabled
> >    (if they ever want to test PRs in parallel)
> >    3. people can upload their own (session) credentials with very
> >    restricted roles
> >    4. document this
> >    5. let someone bold try it out
> >
> >
> > There's also flaky tests. My Junit5 ITest PR adds a @FlakyTest tag which
> > could be used to turn off those which are a bit brittle -but should only be
> > used if the behavior is unfixable (network buffer overruns in
> > (AbstractContractUnbufferTest) is the only legit use I can see
> >
> > >
> > > > Has anyone else already explored this recently? If not, I was thinking
> > of
> > > > filing an INFRA ticket to discuss if they already have established
> > > patterns
> > > > for this. This is potentially relevant to other projects. (It was the
> > > code
> > > > review for FLINK-37247 that prompted me to start this conversation.) I
> > > > think it makes sense to solve it in Hadoop first and then extend it to
> > > > other projects.
> > > >
> > >
> > > Spark and Iceberg use docker and Minio. Good: you only need docker. Bad:
> > it's still some variant of a mock test, as passing it says very little
> > about things working with the real stores. I wouldn't trust a PR to go in
> > with only that.
> >
> > Anyway, I like everyone to test in their own setup as that helps find cases
> > where the connector is brittle to different deployment setups. The more
> > diverse test environments are, the more issues get found and fixed before
> > we ship
> >

---------------------------------------------------------------------
To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: common-dev-h...@hadoop.apache.org

Re: Object Store Integration Tests in Pre-submit

Reply via email to