Hi,


Based on my understanding we have the standard Github Enterprise limit which is 180 parallel job at a given time [1].

Running 180 parallel jobs full-time would give us 180 * 24 * 30/31 build hours per month (this is 129600/133920 in 30/31 days month)

I did a quick research and found that in 2021-02 Apache used about 90k hours and in 2021-03 103k hours.

This was created by 84 Apache project which is increased to 91 in 2021-03.

The average parallel jobs running at a given time is around 120-130 which is under the limit. The problem is that we have spikes. The highest one what I found is 200 job scheduled (when 20 is queued until some jobs are finished).






The problem is that not all the projects use the available capacity at the same level:

During 31 days in March and 91 Apache projects a fair usage would be ~1470 build-hours per month.

There are 13 project which uses Github Actions above this possible limit:


project build hours     average hours per job   max hours per job
nuttx   14138.818889    3.441777        17.845833
pulsar  10785.601944    0.478743        2.011667
airflow 8305.211111     1.247216        20.768056
skywalking      6852.736667     0.959095        7.520278
arrow   6290.633889     0.503573        24.359444
ozone   5484.444722     4.440846        17.473333
camel   4241.184722     0.754525        18.681389
iotdb   4007.576667     1.224061        36.676944
shardingsphere  2858.329444     0.503937        21.633056
beam    2782.366111     0.688364        46.451667
nifi    2380.907500     4.641145        11.313056
apisix  2342.009722     0.209183        24.139167
dubbo   1815.491389     1.537249        11.023889


These 13 project uses the 94% of all the build times.

(Note: there are very painful limitations of Github API: I couldn't identify external runners and some data in case of re-runs can be missing)





There are signs of mis-configuation of some jobs. For example in some projects I found many failure jobs with >15 hours executions even if the slowest successful (!) execution took only a few hours. It clearly shows that job level timeout is not yet configured.

Also the 46 or 36 hours of max job execution time sounds very un-realistic (it's a job, not the full workflow).



My suggestion:

* Publish Github action usage in a central place which is clearly visible for all Apache projects (I would be happy to volunteer here)

* Identify official suggestion of fair-usage (monthly hours) per project (easiest way: available hours / projects using github actions)

* Create a wiki page collecting all the practices to reduce the hours (using the pr cancel workflow discussed earlier + timeouts + ...?)

* After every month send a very polite reminder to the projects who overuses github actions (using dev lists) including detailed statistics and the wiki link to help them to improve/reduce the usage.



What do you think about these ideas?

Thanks,
Marton

ps: my data is here: https://github.com/elek/asf-github-actions-stat
There could be bugs, feel free to ping me if you see any problems.


[1] https://docs.github.com/en/actions/reference/usage-limits-billing-and-administration

Reply via email to