Summary: Below are two PRs that prepare us for adding new GitHub actions. I
also want to ask for your patience and set expectations on review standards
for new actions. More below...

--
As you probably know, insecure github actions on public repositories have
been a big source of supply chain vulnerabilities recently.

After multiple discussions on this topic with folks from Hadoop and other
Apache projects, I've realized that the knowledge required to prevent
vulnerabilites when writing github CI actions is substantial, and very few
people (myself included) understand it fully.

I just posted a PR adding a document about securing github actions on
public repositories, to help get contributors up to speed quickly:

https://github.com/apache/hadoop/pull/8437
(go to changed files -> ... -> view file or "rich diff" for rendered
markdown)

One of the first mitigations suggested by GitHub is to run CodeQL scans on
all your workflow definitions. This PR enables that:

https://github.com/apache/hadoop/pull/8428

For workflow reviews, two things I think are really important:
1. Security. We have a foundational position in many software stacks.
2. Fast / Efficient / Reliable CI. I often mention lofty goals here because
I've seen it pay off, but I also want to enable scaled-down testing for
contributor forks with limited resources and/or platforms available.

Hope this context helps when I'm picky reviewing these. I never enjoy a
rare -1 on merging a PR, but want to avoid CI workflows that are
inefficient or not clearly demonstrating carefull handling of security. In
addition to the document above, I think we should structure our .github
workflows in a way that makes a very clear distinction between priviliged
and unpriviliged workflows. The intention is to keep us safe in the future
as new contributors come and go by making this confusing area a lot
clearer. I think a lot of projects fall short here.

There are other guardrails we could consider; requiring a CODEOWNERS-stye
review on actions changes, asking infra to narrow our default GITHUB_TOKEN
permissions to read-only, etc.

Look forward to your feedback. Especially interested in best practices from
other projects. So far I've looked at Spark and Knox. (Aside: there is an
interesting opportunity to work across projects to share containers for
docker-compose-style integration testing--more on this later).

Thanks for your patience and understanding,
Aaron

Reply via email to