Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

2021-04-08 Thread Jarek Potiuk
>
>
> That's a good idea. We do need to thank Github to give free resources to
> ASF projects, but it's better if we can make it a business: we allow
> individual projects to sign deals with Github to get dedicated resources.
> It's a bit wasteful to ask every project to set up its own dev ops,
> using Github Action is more convenient. Maybe we should raise it to Github?
>

I do not think you can get per-project resources in GH - the most you can
do are self-hosted runners for your project.

(BTW I am not from the INFRA team - just a humble "CI person" of Apache
Airflow but very much vested into Github Actions)
maybe the infra team can chime in here. We did raise it to GitHub, we even
had meeting with them
organized by Gavin and several topics were raised that could be eventually
addressed by Github:

- observability (they could not give us per-project usage dashboard - we
built our own imperfect (with API limitations) one by Tobiasz from Airllow
- security (limiting access to only project committers) - this we handled
by the Ash's fork of Runner (but it's also imperfect - even today I had to
fix a problem where we had list of committers desynchronised between our
infra/CI.yml)
- manageability (assigning resources per-project) - this works by having
self-hosted runners assigned per project (we needed infra JIRA ticket and
generation of a bunch of tokens for our runners and our own AWS account
with auto-scaling).

It would be indeed great if it could be available from GitHub, but so far
we do not have any of those.

J.



> On Wed, Apr 7, 2021 at 9:31 PM Hyukjin Kwon  wrote:
>
> > Thanks Martin for your feedback.
> >
> > > What was your reason to migrate from Apache Jenkins to Github Actions ?
> >
> > I am sure there were more reasons for migrating from Amplap Jenkins
> >  to GitHub Actions but as far
> as
> > I can remember:
> > - To reduce the maintenance cost of machines
> > - The Jenkins machines became unstable and slow causing CI jobs to fail
> or
> > be very flaky.
> > - Difficulty to manage the installed libraries.
> > - Intermittent unknown issues in the machines
> >
> > Yes, one option might be to consider other options to migrate again.
> > However, other projects will very likely suffer the
> > same problem. In addition, the migration in a large project is not an
> > easy work to do
> >
> > I would like to know the feasibility of having more resources in GitHub
> > Actions, or, for example, having sub-groups where
> > each group shares the resources - currently one GitHub organisation
> shares
> > all resources across the projects.
> >
> >
> > 2021년 4월 7일 (수) 오후 10:04, Martin Grigorov 님이 작성:
> >
> >>
> >>
> >> On Wed, Apr 7, 2021 at 3:41 PM Hyukjin Kwon 
> wrote:
> >>
> >>> Hi Greg,
> >>>
> >>> I raised this thread to figure out a way that we can work together to
> >>> resolve this issue, gather feedback, and to understand how other
> projects
> >>> work around.
> >>> Several projects I observed, as far as I can tell, have made enough
> >>> efforts
> >>> to save the resources in GitHub Actions but still suffer from the lack
> of
> >>> resources.
> >>>
> >>
> >> And it will get even worse because:
> >> 1) more and more Apache projects migrate from TravisCI to Github Actions
> >> (GA)
> >> 2) new projects join ASF and many of them already use GA
> >>
> >>
> >> What was your reason to migrate from Apache Jenkins to Github Actions ?
> >> If you want dedicated resources then you will need to manage the CI
> >> yourself.
> >> You could use Apache Jenkins/Buildbot with dedicated agents for your
> >> project.
> >> Or you could set up your own CI infrastructure with Jenkins, DroneIO,
> >> ConcourceCI, ...
> >>
> >> Yet another option is to move to CircleCI or Cirrus. They are similar to
> >> TravisCI / GA and less crowded (for now).
> >>
> >> Martin
> >>
> >> I appreciate the resources provided to us but that does not resolve the
> >>> issue of the development being slowed down.
> >>>
> >>>
> >>> 2021년 4월 7일 (수) 오후 5:52, Greg Stein 님이 작성:
> >>>
> >>> > On Wed, Apr 7, 2021 at 12:25 AM Hyukjin Kwon 
> >>> wrote:
> >>> >
> >>> >> Hi all,
> >>> >>
> >>> >> I am an Apache Spark PMC,
> >>> >
> >>> >
> >>> > You are a member of the Apache Spark PMC. You are *not* a PMC. Please
> >>> stop
> >>> > with that terminology. The Foundation has about 200 PMCs, and you
> are a
> >>> > member of one of them. You are NOT a "PMC" .. you're a person. A PMC
> >>> is a
> >>> > construct of the Foundation.
> >>> >
> >>> > >...
> >>> >
> >>> >> I am aware of the limited GitHub Actions resources that are shared
> >>> >> across all projects in ASF,
> >>> >> and many projects suffer from it. This issue significantly slows
> down
> >>> the
> >>> >> development cycle of
> >>> >>  other projects, at least Apache Spark.
> >>> >>
> >>> >
> >>> > And the Foundation gets those build minutes for GitHub Actions
> >>> provided to
> >>> > us from GitHub and Microsoft, and we are thankful that they 

Re: Increase the number of parallel jobs in GitHub Actions at ASF organization level

2021-04-08 Thread shane knapp ☠
On Wed, Apr 7, 2021 at 6:30 AM Hyukjin Kwon  wrote:

> Thanks Martin for your feedback.
>
> > What was your reason to migrate from Apache Jenkins to Github Actions ?
>
> I am sure there were more reasons for migrating from Amplap Jenkins
>  to GitHub Actions but as far as
> I can remember:
> - To reduce the maintenance cost of machines
> - The Jenkins machines became unstable and slow causing CI jobs to fail or
> be very flaky.
> - Difficulty to manage the installed libraries.
> - Intermittent unknown issues in the machines
>
> also:

- uc berkeley has been hosting the build system for spark for ~10 years
"free of charge"
- funding for the build system is going away (amplab funded first, riselab
second)
- i have been managing the build system solo for 7 years and my job is much
different now...
- since there are no funds coming from research labs, i am unable to staff
the build system past 2021 (tbh, even this year is a stretch)
- the hardware is far past EOL and literally falling over
- jenkins is, and always will be a PITA to run

shane
-- 
Shane Knapp
Computer Guy / Voice of Reason
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu