Summary: Below are two PRs that prepare us for adding new GitHub actions. I also want to ask for your patience and set expectations on review standards for new actions. More below...
-- As you probably know, insecure github actions on public repositories have been a big source of supply chain vulnerabilities recently. After multiple discussions on this topic with folks from Hadoop and other Apache projects, I've realized that the knowledge required to prevent vulnerabilites when writing github CI actions is substantial, and very few people (myself included) understand it fully. I just posted a PR adding a document about securing github actions on public repositories, to help get contributors up to speed quickly: https://github.com/apache/hadoop/pull/8437 (go to changed files -> ... -> view file or "rich diff" for rendered markdown) One of the first mitigations suggested by GitHub is to run CodeQL scans on all your workflow definitions. This PR enables that: https://github.com/apache/hadoop/pull/8428 For workflow reviews, two things I think are really important: 1. Security. We have a foundational position in many software stacks. 2. Fast / Efficient / Reliable CI. I often mention lofty goals here because I've seen it pay off, but I also want to enable scaled-down testing for contributor forks with limited resources and/or platforms available. Hope this context helps when I'm picky reviewing these. I never enjoy a rare -1 on merging a PR, but want to avoid CI workflows that are inefficient or not clearly demonstrating carefull handling of security. In addition to the document above, I think we should structure our .github workflows in a way that makes a very clear distinction between priviliged and unpriviliged workflows. The intention is to keep us safe in the future as new contributors come and go by making this confusing area a lot clearer. I think a lot of projects fall short here. There are other guardrails we could consider; requiring a CODEOWNERS-stye review on actions changes, asking infra to narrow our default GITHUB_TOKEN permissions to read-only, etc. Look forward to your feedback. Especially interested in best practices from other projects. So far I've looked at Spark and Knox. (Aside: there is an interesting opportunity to work across projects to share containers for docker-compose-style integration testing--more on this later). Thanks for your patience and understanding, Aaron
