andygrove opened a new issue, #4671: URL: https://github.com/apache/datafusion-comet/issues/4671
## Describe the bug The `apache-rat-plugin` is bound to the Maven `verify` phase (pom.xml around line 1118), so it runs during every `install` invocation, including the 6 `./mvnw ... -DskipTests install` runs in `dev/release/build-release-comet.sh`. (`-DskipTests` skips tests, not RAT.) RAT scans the root module's directory tree. The exclude list covers `**/target/**`, `**/build/**`, `.git/**`, etc., but does NOT exclude several untracked generated/scratch directories that accumulate during a release: - `dev/release/venv/**` (Python virtualenv, thousands of files) - `dev/release/comet-rm/workdir/**` (docker build working dir) - `dev/dist/**` (extracted release tarballs plus multi-MB `.tar.gz`) - `dev/release/rat.txt`, `dev/release/filtered_rat.txt`, `dev/release/apache-rat-*.jar` During a release build these directories are populated, so each RAT pass walks a very large number of files and the build appears to hang. Because RAT runs in-process inside the Maven JVM, there is no separate `apache-rat` process visible in `ps`, which makes it look like the build is stuck rather than busy. ## To Reproduce Run a release build (`dev/release/build-release-comet.sh`) on a tree where `dev/release/venv` and `dev/dist` are populated, and observe the RAT step during the `mvnw install` runs. ## Expected behavior RAT should skip generated/scratch directories that never contain source requiring license headers. ## Proposed fix Add excludes to the `apache-rat-plugin` configuration in `pom.xml`: ```xml <exclude>dev/release/venv/**</exclude> <exclude>dev/release/comet-rm/workdir/**</exclude> <exclude>dev/dist/**</exclude> <exclude>dev/release/rat.txt</exclude> <exclude>dev/release/filtered_rat.txt</exclude> <exclude>dev/release/*.jar</exclude> ``` This should land on `main` and be cherry-picked to release branches as needed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
