[
https://issues.apache.org/jira/browse/HADOOP-19868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18075485#comment-18075485
]
ASF GitHub Bot commented on HADOOP-19868:
-----------------------------------------
pan3793 commented on code in PR #8450:
URL: https://github.com/apache/hadoop/pull/8450#discussion_r3125722408
##########
.github/workflows/tmpl_build_and_test.yml:
##########
@@ -86,6 +90,19 @@ jobs:
name: Build Image ${{ inputs.os }}-${{ inputs.branch }}
runs-on: ubuntu-24.04
needs: [ precondition ]
+ # Security: this does not leak write access for our image repository to
+ # forked repos.
+ #
+ # We have `packages: write` permissions for our GITHUB_TOKEN below.
However:
+ #
+ # - For `pull_request`, GitHub downgrades GITHUB_TOKEN permissions to
+ # read-only.
+ # - For `push` triggers on a fork, the GITHUB_TOKEN retains write
+ # permissions, but the `push` is happening in the context of the fork,
not
+ # the upstream repo.
+ # - For `pull_request_target` (not used here), image repo permissions are
+ # scoped to the repository they run on. This prevents forks from writing
+ # to our Apache Hadoop image repo.
Review Comment:
details explanation of my understanding
suppose Tom opens a PR target to apache/hadoop trunk branch from a forked
repo, which adds a workflow yaml defines `pull_request_target`, then nothing
happens on this PR, because GHA looks at the apache/hadoop trunk branch, and it
does not have such a workflow.
it this PR gets merged, then the next PR targeting the apache/hadoop trunk
branch will trigger this workflow, and it uses the trunk's version and ignores
the PR's change in this workflow yaml. what's dangerous here is - the workflow
might consume the resource from the PR change, for example, the workflow use
the Dockerfile to build an image and push to apache/hadoop image repo, the
context only protects the workflow yaml file itself (also use the trunk branch
version instead of PR's version), but it still able to read to unstrust code
from PR (because we explicitly checkout the PR code to CI workspace to verify
the PR change). so, why we are safe? because we only use `pull_request_target`
for workflows that do not read files from the source repo. currently, it's a
script that only calls the GitHub API, hackers have no opportunity to inject
their code for execution without getting a PR merged.
our CI requires some workflows to run on PR with write permission, to edit
labels, to create checks, all of which won't consume untrusted code, so we are
safe. without write permission then they stop working.
> ci: add security comments to github actions
> -------------------------------------------
>
> Key: HADOOP-19868
> URL: https://issues.apache.org/jira/browse/HADOOP-19868
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: test
> Reporter: Aaron Fabbri
> Assignee: Aaron Fabbri
> Priority: Minor
> Labels: pull-request-available
>
> Following up on HADOOP-19858, I have a patch for some `# Security:` comments
> to add to our github actions to explain why each workflow is safe.
> I'm also following up on INFRA-27839, just to double check they haven't
> enabled any risky defaults. I'll add comments with any details I find.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]