[ 
https://issues.apache.org/jira/browse/HADOOP-19858?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18074714#comment-18074714
 ] 

ASF GitHub Bot commented on HADOOP-19858:
-----------------------------------------

pan3793 commented on PR #8412:
URL: https://github.com/apache/hadoop/pull/8412#issuecomment-4276656967

   > 2. Spark example blocks some of their workflows from running on fork repos 
with conditionals [like this 
one](https://github.com/apache/spark/blob/a2efe5fdcf9767ee8f113f7487de1a8092d2e7c6/.github/workflows/build_non_ansi.yml#L33):
 `if github.repository == 'apache/spark'`.
   >
   > This PR is missing 2, right?
   
   @ajfabbri yes, but this is irrelevant to security, the intention here is to 
save resources, I remember in those days, the forked repos also auto-run cron 
jobs by default. - note, the contributor has full control over their forked 
repo, so they can remove `if github.repository == 'apache/spark'` condition in 
their forked repo if they want.
   
   spark has a lot of profiles (java/scala/python/arrow/pandas versions, 
ANSI/non-ANSI, sbt/maven, etc.), a full combination produces a large matrix, so 
it selects a part of that to run on PR and push, and schedules daily jobs (you 
can find those jobs status in the README.md) on `apache/spark` repo for other 
combinations.
   
   > Spark uses these workflows from privileged context on the official repo 
(workflow "cron" triggers that can't be triggered by forks) as well as allowing 
them to run on forks. The thing I dislike about their example is it lacks clear 
separation of privilege. I think we could do better on that.
   
   I'm not sure what your definition of "privileged". I think it's a normal use 
case of GHA, the same workflow can be triggered by different events from 
different contexts, we just need to be careful with one case - a workflow can 
be triggered on the upstream context, and the workflow consumes untrusted code 
from PR.
   
   > Are you thinking that we 1. build the hadoop-build (dev-support) 
container. 2. run maven build in that container. 3. build pre-installed images 
(like hadoop-trunk, hadoop-, 
[hadoop-3](https://issues.apache.org/jira/browse/HADOOP-3).5, etc.) 4. run 
tests on those installed images?
   
   I don't think this is something we want to do, for unit tests. Generally, 
testing requires more dev dependencies, which might not be required by the 
runtime, for example, in Maven, we can define dependencies in compile, runtime, 
test scopes, when runs UT, it pulls the runtime + test scopes deps into the 
classpath, similar things apply to native libs.
   
   TBH, I didn't see many production use cases that deploy Hadoop in 
containers, and obviously, the current "pre-installed images" are mostly used 
for downstream projects for testing, and it only covers a few cases, at least 
kerberized YARN is likely not to work - for hadoop 3.4.x, official hadoop bin 
tgz was built on ubuntu 20.04, with openssl 1.x, while the pre-installed images 
is based on the ubuntu 22.04, which only has openssl 3.x in apt repo, IIRC it 
will fail the kerberized linux container to start.
   
   while building some smoking/integration tests like [spark 
kubernetes/integration-tests 
](https://github.com/apache/spark/blob/branch-4.1/resource-managers/kubernetes/integration-tests/README.md)
 based on the "pre-installed images" might be a good supplement in the future, 
but obviously, this is out of the scope of the current goals




> Set up build workflow in GitHub Actions
> ---------------------------------------
>
>                 Key: HADOOP-19858
>                 URL: https://issues.apache.org/jira/browse/HADOOP-19858
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: build
>            Reporter: Cheng Pan
>            Priority: Major
>              Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to