pan3793 commented on code in PR #8450:
URL: https://github.com/apache/hadoop/pull/8450#discussion_r3125722408


##########
.github/workflows/tmpl_build_and_test.yml:
##########
@@ -86,6 +90,19 @@ jobs:
     name: Build Image ${{ inputs.os }}-${{ inputs.branch }}
     runs-on: ubuntu-24.04
     needs: [ precondition ]
+    # Security: this does not leak write access for our image repository to
+    # forked repos.
+    #
+    # We have `packages: write` permissions for our GITHUB_TOKEN below. 
However:
+    #
+    # - For `pull_request`, GitHub downgrades GITHUB_TOKEN permissions to
+    #   read-only.
+    # - For `push` triggers on a fork, the GITHUB_TOKEN retains write
+    #   permissions, but the `push` is happening in the context of the fork, 
not
+    #   the upstream repo.
+    # - For `pull_request_target` (not used here), image repo permissions are
+    #   scoped to the repository they run on. This prevents forks from writing
+    #   to our Apache Hadoop image repo.

Review Comment:
   details explanation of my understanding
   
   suppose Tom opens a PR target to apache/hadoop trunk branch from a forked 
repo, which adds a workflow yaml defines `pull_request_target`, then nothing 
happens on this PR, because GHA looks at the apache/hadoop trunk branch, and it 
does not have such a workflow.
   
   it this PR gets merged, then the next PR targeting the apache/hadoop trunk 
branch will trigger this workflow, and it uses the trunk's version and ignores 
the PR's change in this workflow yaml. what's dangerous here is - the workflow 
might consume the code from the PR change, for example, the workflow use the 
Dockerfile to build an image and push to apache/hadoop image repo, the context 
only protects the workflow yaml file itself (again, it use the trunk branch 
version yaml instead of PR's version), but it still able to read to unstrust 
code from PR (because we explicitly checkout the PR code to CI workspace to 
verify the PR change). so, why we are safe? because we only use 
`pull_request_target` for workflows that do not read files from the source 
repo. currently, it's a script that only calls the GitHub API, hackers have no 
opportunity to inject their code for execution without getting a PR merged.
   
   our CI requires some workflows to run on PR with write permission, to edit 
labels, to create checks, all of which won't consume untrusted code, so we are 
safe. without write permission then they stop working.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to