Great, thanks everyone! I went ahead and filed an infra ticket to ask for buckets/credentials:
https://issues.apache.org/jira/browse/INFRA-24353 I'll keep you posted on progress. Steve, yes, I'm planning to start a HADOOP-19343 merge discuss/vote soon. Chris Nauroth On Thu, Jul 24, 2025 at 4:47 AM Steve Loughran <ste...@cloudera.com.invalid> wrote: > Didn't know about the ASF credentials. We'd want them to be used somewhere > to generate those session credentials, with those credentials the only > secrets that a test run would have. > > I"d thought of somehow generating restricted session credentials to the > target bucket only, and with a duration of 60 minutes -loss of credentials > would only have marginal effect, primarily one of cost rather than > privilege. > > > > > > > > One nice aspect of GitHub Actions is that they can also be run on > > individual forks. Contributors can configure their own AWS credentials > > as secrets in their forks of the Hadoop repo and run the tests there. > > This would help avoid consuming ASF resources directly. If ASF > > credentials aren’t available, a link to the successful run on their > > fork can also be included as a comment on the PR to confirm the test > > results. > > > > > +1 > > > > This was just an early idea I had back then—feel free to explore it > > further if it seems useful. > > > > -Ayush > > > > [1] https://issues.apache.org/jira/browse/INFRA-24353 > > > > On Thu, 24 Jul 2025 at 04:30, Chris Nauroth <cnaur...@apache.org> wrote: > > > > > > Hello everyone, > > > > > > For years, we've relied on specific contributors to run and verify the > > > integration tests for object store integrations like S3A, because the > > tests > > > require credentials for specific cloud providers. I'd like to explore > if > > we > > > have any path forward today to bringing those tests into the pre-submit > > > automation. If successful, I'd like to apply that strategy to the GCS > > > integration tests, which are part of HADOOP-19343. > > > > thinking about this -do you think this stuff should be merged in and > stabilize in place? > You've all been working on it for a while > > > > > > To make this work, we'd need to either 1) run tests in a VM hosted in > the > > > cloud provider, where credentials are vended natively from an adjacent > > > metadata server, or > > > Impala does this > > > > 2) export credentials so that the tests can run in any > > > VM outside the cloud provider (and be really, really, really careful to > > > secure the access to those exported credentials). > > > > If I could wire up my own credentials to github credentials/actions, I'd > locally generated a 12 hour session triple and upload them to github > secrets for my own actions only. I'd need to somehow set up the test run so > that > > 1. the binding info is picked up (i.e. auth-keys.xml) created in the > right place -or that is modified to fall back to env vars (it probably > already does this for aws credentials, so it's only the target bucket > to be > picked up, e.g. HADOOP_AWS_TARGET_BUCKET. Easily done. > 2. maven test runs exclude the root bucket tests and instead pick up a > run ID to use as the base path for tests. The build is set up for ths. > > > Running tests with an env var test target rather than an auth-keys file > could be done with something in core-site.xml which would set the test > target to that of an env var; auth-keys.xml would override it. > > <!-- why do we have two? --> > <property> > <name>test.fs.s3a.name</name> > <value>${env.HADOOP_AWS_BUCKET:-s3a://none}</value> > </property> > > <property> > <name>fs.contract.test.fs.s3a</name> > <value>${test.fs.s3a.name}</value> > </property> > > <include xmlns="http://www.w3.org/2001/XInclude" href="auth-keys.xml"> > <fallback/> > </include> > > we'd need some special handling in test setup/S3AContract to recognise that > "s3a://none" is a special marker to indicate there is no target FS. Again, > easily done. > > Summary of thoughts: > > 1. we put env var binding into core-site.xml with S3AContract support > 2. github action can run the itests without root bucket tests enabled > (if they ever want to test PRs in parallel) > 3. people can upload their own (session) credentials with very > restricted roles > 4. document this > 5. let someone bold try it out > > > There's also flaky tests. My Junit5 ITest PR adds a @FlakyTest tag which > could be used to turn off those which are a bit brittle -but should only be > used if the behavior is unfixable (network buffer overruns in > (AbstractContractUnbufferTest) is the only legit use I can see > > > > > > Has anyone else already explored this recently? If not, I was thinking > of > > > filing an INFRA ticket to discuss if they already have established > > patterns > > > for this. This is potentially relevant to other projects. (It was the > > code > > > review for FLINK-37247 that prompted me to start this conversation.) I > > > think it makes sense to solve it in Hadoop first and then extend it to > > > other projects. > > > > > > > Spark and Iceberg use docker and Minio. Good: you only need docker. Bad: > it's still some variant of a mock test, as passing it says very little > about things working with the real stores. I wouldn't trust a PR to go in > with only that. > > Anyway, I like everyone to test in their own setup as that helps find cases > where the connector is brittle to different deployment setups. The more > diverse test environments are, the more issues get found and fixed before > we ship >