Hi,
Based on my understanding we have the standard Github Enterprise limit
which is 180 parallel job at a given time [1].
Running 180 parallel jobs full-time would give us 180 * 24 * 30/31 build
hours per month (this is 129600/133920 in 30/31 days month)
I did a quick research and found that in 2021-02 Apache used about 90k
hours and in 2021-03 103k hours.
This was created by 84 Apache project which is increased to 91 in 2021-03.
The average parallel jobs running at a given time is around 120-130
which is under the limit. The problem is that we have spikes. The
highest one what I found is 200 job scheduled (when 20 is queued until
some jobs are finished).
The problem is that not all the projects use the available capacity at
the same level:
During 31 days in March and 91 Apache projects a fair usage would be
~1470 build-hours per month.
There are 13 project which uses Github Actions above this possible limit:
project build hours average hours per job max hours per job
nuttx 14138.818889 3.441777 17.845833
pulsar 10785.601944 0.478743 2.011667
airflow 8305.211111 1.247216 20.768056
skywalking 6852.736667 0.959095 7.520278
arrow 6290.633889 0.503573 24.359444
ozone 5484.444722 4.440846 17.473333
camel 4241.184722 0.754525 18.681389
iotdb 4007.576667 1.224061 36.676944
shardingsphere 2858.329444 0.503937 21.633056
beam 2782.366111 0.688364 46.451667
nifi 2380.907500 4.641145 11.313056
apisix 2342.009722 0.209183 24.139167
dubbo 1815.491389 1.537249 11.023889
These 13 project uses the 94% of all the build times.
(Note: there are very painful limitations of Github API: I couldn't
identify external runners and some data in case of re-runs can be missing)
There are signs of mis-configuation of some jobs. For example in some
projects I found many failure jobs with >15 hours executions even if the
slowest successful (!) execution took only a few hours. It clearly shows
that job level timeout is not yet configured.
Also the 46 or 36 hours of max job execution time sounds very
un-realistic (it's a job, not the full workflow).
My suggestion:
* Publish Github action usage in a central place which is clearly
visible for all Apache projects (I would be happy to volunteer here)
* Identify official suggestion of fair-usage (monthly hours) per
project (easiest way: available hours / projects using github actions)
* Create a wiki page collecting all the practices to reduce the hours
(using the pr cancel workflow discussed earlier + timeouts + ...?)
* After every month send a very polite reminder to the projects who
overuses github actions (using dev lists) including detailed statistics
and the wiki link to help them to improve/reduce the usage.
What do you think about these ideas?
Thanks,
Marton
ps: my data is here: https://github.com/elek/asf-github-actions-stat
There could be bugs, feel free to ping me if you see any problems.
[1]
https://docs.github.com/en/actions/reference/usage-limits-billing-and-administration