+1, binding. Awesome piece of work!
I've done three forms of qualification, all related to s3 and azure storage 1. tarball validate, CLI use 2. build/test of downstream modules off maven artifacts; mine and some other ASF ones. I (and it its very much me) have broken some downstream modules tests, as I will discuss below. PRs submitted to the relevant projects 3. local rerun of the hadoop-aws and hadoop-azure test suites *Regarding issues which surfaced* Wei-Chiu: can you register your private GPG key with the public keystores? The gpg client apps let you do this? Then we can coordinate signing each other's keys Filed PRs for the test regressions: https://github.com/apache/hbase-filesystem/pull/23 https://github.com/GoogleCloudDataproc/hadoop-connectors/pull/569 *Artifact validation* SHA checksum good: shasum -a 512 hadoop-3.3.1-RC3.tar.gz b80e0a8785b0f3d75d9db54340123872e39bad72cc60de5d263ae22024720e6e824e022090f01e248bf105e03b0f06163729adbe15b5b0978bae0447571e22eb hadoop-3.3.1-RC3.tar.gz GPG: trickier, because Wei-Chiu wasn't trusted > gpg --verify hadoop-3.3.1-RC3.tar.gz.asc gpg: assuming signed data in 'hadoop-3.3.1-RC3.tar.gz' gpg: Signature made Tue Jun 1 11:00:41 2021 BST gpg: using RSA key CD32D773FF41C3F9E74BDB7FB362E1C021854B9D gpg: requesting key 0xB362E1C021854B9D from hkps server hkps.pool.sks-keyservers.net gpg: Can't check signature: No public key *Wei-Chiu: can you add your public keys to the GPG key servers* To validate the keys I went to the directory where I have our site under svn (https://dist.apache.org/repos/dist/release/hadoop/common) , and, after reinstalling svn (where did it go? when did it go?) did an svn update to get the keys Did a gpg import of the KEYS file, added gpg: key 0x386D80EF81E7469A: public key "Brahma Reddy Battula (CODE SIGNING KEY) <bra...@apache.org>" imported gpg: key 0xFC8D04357BB49FF0: public key "Sammi Chen (CODE SIGNING KEY) < sammic...@apache.org>" imported gpg: key 0x36243EECE206BB0D: public key "Masatake Iwasaki (CODE SIGNING KEY) <iwasak...@apache.org>" imported *gpg: key 0xB362E1C021854B9D: public key "Wei-Chiu Chuang <weic...@apache.org <weic...@apache.org>>" imported* This time an import did work, but Wei-Chiu isn't trusted by anyone yet gpg --verify hadoop-3.3.1-RC3.tar.gz.asc gpg: assuming signed data in 'hadoop-3.3.1-RC3.tar.gz' gpg: Signature made Tue Jun 1 11:00:41 2021 BST gpg: using RSA key CD32D773FF41C3F9E74BDB7FB362E1C021854B9D gpg: Good signature from "Wei-Chiu Chuang <weic...@apache.org>" [unknown] gpg: WARNING: This key is not certified with a trusted signature! gpg: There is no indication that the signature belongs to the owner. Primary key fingerprint: CD32 D773 FF41 C3F9 E74B DB7F B362 E1C0 2185 4B9D (Wei-Chiu, let's coordinate signing each other's public keys via a slack channel; you need to be in the apache web of trust) > time gunzip hadoop-3.3.1-RC3.tar.gz (5 seconds) cd into the hadoop dir; cp my confs in: cp ~/(somewhere)/hadoop-conf/* etc/hadoop/ cp the hadoop-azure dependencies from share/hadoop/tools/lib/ to share/hadoop/common/lib (products built targeting Azure put things there) run: all the s3a "qualifying an AWS SDK update" commands https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/testing.html#Qualifying_an_AWS_SDK_Update run: basic abfs:// FS operations; again no problems. FWIW I think we should consider having the hadoop-aws module and dependencies, and the aws ones in hadoop-common/lib. I can get them there through env vars and the s3guard shell sets things up, but azure is fiddly. *Build and test cloudstore JAR; invoke from CLI* This is my cloud-storage extension library https://github.com/steveloughran/cloudstore I've always intended to put it into hadoop, but as it is where a lot of diagnostics and quick way to put together fixes "here's a faster du ("dux"") https://github.com/steveloughran/cloudstore.git modify the hadoop-3.3 profile to use 3.3.1 artifacts, then build with snapshots enabled. Because I'd not (yet) built any 3.3.1 artifacts locally, this fetched them from maven staging mvn package -Phadoop-3.3 -Pextra -Psnapshots-and-staging Set up env var $CLOUDSTORE to point to JAR; $BUCKET to s3a bucket, run various commands (storediag, cloudup, ...). As an example, here's the "dux" command, which is "hadoop fs -du" with parallel scan underneath the dir for better scaling bin/hadoop jar $CLOUDSTORE dux -threads 64 -limit 1000 -verbose s3a://stevel-london/ output is in https://gist.github.com/steveloughran/664d30cef20f605f3164ad01f92a458a *Build and (unit test) google GCS: * Two test failures, one of which was classpath related and the other just a new rename contract test needing a new setting in gs.xml to declare what rename of file over file does. Everything is covered in: https://github.com/GoogleCloudDataproc/hadoop-connectors/pull/569 Classpath: assertJ not coming through hadoop-common-test JAR dependencies. [ERROR] com.google.cloud.hadoop.fs.gcs.contract.TestInMemoryGoogleContractRootDirectory.testSimpleRootListing Time elapsed: 0.093 s <<< ERROR! java.lang.NoClassDefFoundError: org/assertj/core/api/Assertions Caused by: java.lang.ClassNotFoundException: org.assertj.core.api.Assertions Happens because I added some tests to the AbstractContractRenameTest which use assertJ assertions. Assertj is declared in test scope for hadoop-common test JAR, it's somehow not propagating. HBoss has the same issue. <dependency> <groupId>org.assertj</groupId> <artifactId>assertj-core</artifactId> <scope>test</scope> </dependency> I really don't understand what is up with our declared exports; just reviewed them. Nothing we can do about it that I can see. Rename test failure is from a new test, with the expected behaviour needing definition. [ERROR] Failures: [ERROR] TestInMemoryGoogleContractRename>AbstractContractRenameTest.testRenameFileOverExistingFile:131->Assert.fail:89 expected rename(gs://fake-in-memory-test-bucket/contract-test/source-256.txt, gs://fake-in-memory-test-bucket/contract-test/dest-512.txt) to be rejected with exception, but got false Fix, add "fs.contract.rename-returns-false-if-dest-exists" = true to the XML contract. *Build and test HBoss* This is the HBase extension to use ZK to lock file accesses on S3 I've broken their build through to changes to the internal S3 client factory as some new client options were passed down (HADOOP-13551). That change moved to a new build parameter object, so we can add future changes without breaking the signature again (mehakmeet already has in HADOOP-17705) https://issues.apache.org/jira/browse/HBASE-25900 Got an initial PR up, though will need to do more so that it will also compile/test against older builds https://github.com/apache/hbase-filesystem/pull/23 *Build spark, then test S3A Committers through it* Build spark-3 against 3.3.1, then ran integration tests against S3 london Test are in: https://github.com/hortonworks-spark/cloud-integration.git Most of an afternoon was frittered away dealing with the fact that the spark version move (2.4 to 3.2) meant scalatest upgrade from 3.0 to 3.20 **and every single test failed to compile because the scalatest project moved the foundational test suite into a new package**. I had to do that same upgrade to test my WiP manifest committer (MAPREDUCE-7341) against ABFS, so it's not completely wasted. It does mean that module and tests is scala 3+ only. hadoop-aws and hadoop-azure test suites For these I checked out branch-3.3.1, rebuilt it and ran the test suites in the hadoop-azure and hadoop-aws modules. This triggered a rebuild of those two modules. I did this after doing all the other checks, so everything else was qualified against the genuine RC3 artifacts. hadoop-aws run 1: -Dparallel-tests -DtestsThreadCount=5 -Dmarkers=keep run 2: -Dparallel-tests -DtestsThreadCount=6 -Ds3guard -Dscale -Ddynamo azure -Dparallel-tests=abfs -DtestsThreadCount=5 -Dscale [ERROR] Errors: [ERROR] ITestAbfsFileSystemContractSecureDistCp>AbstractContractDistCpTest.testDistCpWithIterator:642 ยป TestTimedOut [INFO] This is https://issues.apache.org/jira/browse/HADOOP-17628 Overall then: 1. All production code good. 2. some expansion of filesystem tests require some changes downstream, and the change in the S3Client from HADOOP-13551 the HBoss tests using an internal interface from compiling. The move to a parameter object (and documenting this use) is intended to prevent this reoccurring. >