[ 
https://issues.apache.org/jira/browse/HADOOP-19868?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18075485#comment-18075485
 ] 

ASF GitHub Bot commented on HADOOP-19868:
-----------------------------------------

pan3793 commented on code in PR #8450:
URL: https://github.com/apache/hadoop/pull/8450#discussion_r3125722408


##########
.github/workflows/tmpl_build_and_test.yml:
##########
@@ -86,6 +90,19 @@ jobs:
     name: Build Image ${{ inputs.os }}-${{ inputs.branch }}
     runs-on: ubuntu-24.04
     needs: [ precondition ]
+    # Security: this does not leak write access for our image repository to
+    # forked repos.
+    #
+    # We have `packages: write` permissions for our GITHUB_TOKEN below. 
However:
+    #
+    # - For `pull_request`, GitHub downgrades GITHUB_TOKEN permissions to
+    #   read-only.
+    # - For `push` triggers on a fork, the GITHUB_TOKEN retains write
+    #   permissions, but the `push` is happening in the context of the fork, 
not
+    #   the upstream repo.
+    # - For `pull_request_target` (not used here), image repo permissions are
+    #   scoped to the repository they run on. This prevents forks from writing
+    #   to our Apache Hadoop image repo.

Review Comment:
   details explanation of my understanding
   
   suppose Tom opens a PR target to apache/hadoop trunk branch from a forked 
repo, which adds a workflow yaml defines `pull_request_target`, then nothing 
happens on this PR, because GHA looks at the apache/hadoop trunk branch, and it 
does not have such a workflow.
   
   it this PR gets merged, then the next PR targeting the apache/hadoop trunk 
branch will trigger this workflow, and it uses the trunk's version and ignores 
the PR's change in this workflow yaml. what's dangerous here is - the workflow 
might consume the resource from the PR change, for example, the workflow use 
the Dockerfile to build an image and push to apache/hadoop image repo, the 
context only protects the workflow yaml file itself (also use the trunk branch 
version instead of PR's version), but it still able to read to unstrust code 
from PR (because we explicitly checkout the PR code to CI workspace to verify 
the PR change). so, why we are safe? because we only use `pull_request_target` 
for workflows that do not read files from the source repo. currently, it's a 
script that only calls the GitHub API, hackers have no opportunity to inject 
their code for execution without getting a PR merged.
   
   our CI requires some workflows to run on PR with write permission, to edit 
labels, to create checks, all of which won't consume untrusted code, so we are 
safe. without write permission then they stop working.





> ci: add security comments to github actions
> -------------------------------------------
>
>                 Key: HADOOP-19868
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19868
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: test
>            Reporter: Aaron Fabbri
>            Assignee: Aaron Fabbri
>            Priority: Minor
>              Labels: pull-request-available
>
> Following up on HADOOP-19858, I have a patch for some `# Security:` comments 
> to add to our github actions to explain why each workflow is safe. 
> I'm also following up on INFRA-27839, just to double check they haven't 
> enabled any risky defaults. I'll add comments with any details I find.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to