I just merged https://github.com/apache/spark/pull/10461, a PR that adds new automated tooling to help us reason about dependency changes in Spark. Here's a summary of the changes:
- The dev/run-tests script (used in the SBT Jenkins builds and for testing Spark pull requests) now generates a file which contains Spark's resolved runtime classpath for each Hadoop profile, then compares that file to a copy which is checked into the repository. These dependency lists are found at https://github.com/apache/spark/tree/master/dev/deps; there is a separate list for each Hadoop profile. - If a pull request changes dependencies without updating these manifest files, our test script will fail the build <https://github.com/apache/spark/pull/10461#issuecomment-168066328> and the build console output will list the dependency diff <https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/48505/consoleFull> . - If you are intentionally changing dependencies, run ./dev/test-dependencies.sh --replace-manifest to re-generate these dependency manifests then commit the changed files and include them with your pull request. The goal of this change is to make it simpler to reason about build changes: it should now be much easier to verify whether dependency exclusions worked properly or determine whether transitive dependencies changed in a way that affects the final classpath. Let me know if you have any questions about this change and, as always, feel free to submit pull requests if you would like to make any enhancements to this script. Thanks, Josh