Hey Chris, Hope you mean "https://issues.apache.org/jira/browse/INFRA-27071"
Regards, Brahma On Fri, Jul 25, 2025 at 4:38 AM Chris Nauroth <cnaur...@apache.org> wrote: > > Great, thanks everyone! I went ahead and filed an infra ticket to ask for > buckets/credentials: > > https://issues.apache.org/jira/browse/INFRA-24353 > > I'll keep you posted on progress. > > Steve, yes, I'm planning to start a HADOOP-19343 merge discuss/vote soon. > > Chris Nauroth > > > On Thu, Jul 24, 2025 at 4:47 AM Steve Loughran <ste...@cloudera.com.invalid> > wrote: > > > Didn't know about the ASF credentials. We'd want them to be used somewhere > > to generate those session credentials, with those credentials the only > > secrets that a test run would have. > > > > I"d thought of somehow generating restricted session credentials to the > > target bucket only, and with a duration of 60 minutes -loss of credentials > > would only have marginal effect, primarily one of cost rather than > > privilege. > > > > > > > > > > > > > One nice aspect of GitHub Actions is that they can also be run on > > > individual forks. Contributors can configure their own AWS credentials > > > as secrets in their forks of the Hadoop repo and run the tests there. > > > This would help avoid consuming ASF resources directly. If ASF > > > credentials aren’t available, a link to the successful run on their > > > fork can also be included as a comment on the PR to confirm the test > > > results. > > > > > > > > +1 > > > > > > > This was just an early idea I had back then—feel free to explore it > > > further if it seems useful. > > > > > > -Ayush > > > > > > [1] https://issues.apache.org/jira/browse/INFRA-24353 > > > > > > On Thu, 24 Jul 2025 at 04:30, Chris Nauroth <cnaur...@apache.org> wrote: > > > > > > > > Hello everyone, > > > > > > > > For years, we've relied on specific contributors to run and verify the > > > > integration tests for object store integrations like S3A, because the > > > tests > > > > require credentials for specific cloud providers. I'd like to explore > > if > > > we > > > > have any path forward today to bringing those tests into the pre-submit > > > > automation. If successful, I'd like to apply that strategy to the GCS > > > > integration tests, which are part of HADOOP-19343. > > > > > > > thinking about this -do you think this stuff should be merged in and > > stabilize in place? > > You've all been working on it for a while > > > > > > > > > To make this work, we'd need to either 1) run tests in a VM hosted in > > the > > > > cloud provider, where credentials are vended natively from an adjacent > > > > metadata server, or > > > > > > Impala does this > > > > > > > 2) export credentials so that the tests can run in any > > > > VM outside the cloud provider (and be really, really, really careful to > > > > secure the access to those exported credentials). > > > > > > > If I could wire up my own credentials to github credentials/actions, I'd > > locally generated a 12 hour session triple and upload them to github > > secrets for my own actions only. I'd need to somehow set up the test run so > > that > > > > 1. the binding info is picked up (i.e. auth-keys.xml) created in the > > right place -or that is modified to fall back to env vars (it probably > > already does this for aws credentials, so it's only the target bucket > > to be > > picked up, e.g. HADOOP_AWS_TARGET_BUCKET. Easily done. > > 2. maven test runs exclude the root bucket tests and instead pick up a > > run ID to use as the base path for tests. The build is set up for ths. > > > > > > Running tests with an env var test target rather than an auth-keys file > > could be done with something in core-site.xml which would set the test > > target to that of an env var; auth-keys.xml would override it. > > > > <!-- why do we have two? --> > > <property> > > <name>test.fs.s3a.name</name> > > <value>${env.HADOOP_AWS_BUCKET:-s3a://none}</value> > > </property> > > > > <property> > > <name>fs.contract.test.fs.s3a</name> > > <value>${test.fs.s3a.name}</value> > > </property> > > > > <include xmlns="http://www.w3.org/2001/XInclude" href="auth-keys.xml"> > > <fallback/> > > </include> > > > > we'd need some special handling in test setup/S3AContract to recognise that > > "s3a://none" is a special marker to indicate there is no target FS. Again, > > easily done. > > > > Summary of thoughts: > > > > 1. we put env var binding into core-site.xml with S3AContract support > > 2. github action can run the itests without root bucket tests enabled > > (if they ever want to test PRs in parallel) > > 3. people can upload their own (session) credentials with very > > restricted roles > > 4. document this > > 5. let someone bold try it out > > > > > > There's also flaky tests. My Junit5 ITest PR adds a @FlakyTest tag which > > could be used to turn off those which are a bit brittle -but should only be > > used if the behavior is unfixable (network buffer overruns in > > (AbstractContractUnbufferTest) is the only legit use I can see > > > > > > > > > Has anyone else already explored this recently? If not, I was thinking > > of > > > > filing an INFRA ticket to discuss if they already have established > > > patterns > > > > for this. This is potentially relevant to other projects. (It was the > > > code > > > > review for FLINK-37247 that prompted me to start this conversation.) I > > > > think it makes sense to solve it in Hadoop first and then extend it to > > > > other projects. > > > > > > > > > > Spark and Iceberg use docker and Minio. Good: you only need docker. Bad: > > it's still some variant of a mock test, as passing it says very little > > about things working with the real stores. I wouldn't trust a PR to go in > > with only that. > > > > Anyway, I like everyone to test in their own setup as that helps find cases > > where the connector is brittle to different deployment setups. The more > > diverse test environments are, the more issues get found and fixed before > > we ship > > --------------------------------------------------------------------- To unsubscribe, e-mail: common-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-dev-h...@hadoop.apache.org