Dear all, As you are likely aware, we will be migrating to GitHub, form Gerrit, soon. Some have reached out to me with questions about the testing infrastructure on GitHub and how it may be used and improved. I shall do so in this email.
GitHub allows for testing via its “Actions” infrastructure. GitHub Actions consists of “Workflows” specifies a set of jobs to be run that is executed on a specific event. GitHub Actions supports a wide range of these events from pull request creation, through to specific keywords appearing comments, simple scheduled runs and many more. The jobs the workflow specify run commands in a series steps and it is highly flexible for a large number of automation tasks. Some repos automatically moderate discussion threads, and others use it to build and deploy their service. Right now we are just using them to run tests. GitHub looks for GitHub Actions specified in a yaml format in “./github/workflows". Our current workload yaml file can be found here: https://github.com/gem5/gem5/blob/ad0a2d1beaa043c03c0e43406078b3a09a3861ac/.github/workflows/. There are many resources online explaining how to create yaml files to specific jobs and triggered on specific events so I won’t go into further details than this high-level description. There is one small limitation with GitHub Actions which we will need to change procedures for. GitHub only reads the yaml files on the repository’s main branch. This means if we want to update which tests are run, or how they are run, we need to update the stable branch. After some discussion we believe the best policy will be to permit patches to be submitted to the stable branch between releases for changes to these yaml files. Since these files do not affect the compilation or running of gem5, the stable branch is still “stable” with respect to the end user's interaction with gem5. Jobs run on “runners". A runner is just a server which accepts GitHub jobs to run. They run one job at one time. Typically you would pay GitHub to use their runners as most actions complete in a matter of seconds so incur little cost. That won’t work for us as some of our tests take days to complete. Fortunately GitHub allows for “self-hosted runners”. With tooling provided by GitHub you can setup a runner on any machine you want and point it towards the git repository it is to accept job requests from. There is one big problem with this: A self-hosted runner is not secure. With the right job specification you can execute whatever you want on the host hardware. A smaller annoyance is GitHub makes it hard, but not impossible, to run more than one runner per machine, which is annoying when ideally you want several runners to be executing jobs in parallel on machines that can handle them. Our solution to this is runners setup in virtual machines. We attempted to utilize Kubernetes for this for us but found it’s more tailored towards large cloud-based clusters where as we want to utilize a smaller number of servers at our disposal. After some trial and error we decided it wasn’t the right tool for the job. Moving on from this we opted to use Vagrant to create VMs to host the runners. I have documented all the scripts I used to do this here: https://gem5-review.googlesource.com/c/public/gem5/+/71098. You can consult the “README.md” on procedures to setup your own runners. Though I have created some scripts to semi-automated the process, it’s still quite manual. It would be nice if there was a more “push button” way to do deploy runners. In a similar vain, if they break we have to manually go in and restart them. There’s room for improvement here. Right now we have two types of VM’s: “builders” and “runners”. Builders are 4-core 16GB VMs with their primarily purpose being to build gem5. Runners are single-core 6GB VMs with their purpose being to run instances of gem5. Aiming for a rough 6 to 1 ratio we have 26 runners and 4 builders spread over 3 machines though this is very lopsided as 1 of our machines hosts 20 runners. In the yaml file the jobs are distributed to either a runner or builder based on the “run-on” field. Though this setup is currently functional, it does have some restrictions and pain-points. Of note: - We do not have a runner which can run KVM tests. For the meantime these are skipped. We’re not sure how feasible putting a runner in a VM which will allow KVM is. - Due to the Weekly GPU tests needing a special docker container built in the tests, we need more time to figure out how to do this. At present we get errors but are working finding a solution. - We do not have good tools to orchestrate these VMs. If they go down and they need restarted, or new VMs need created, it requires manual effort. - 20 of our runners are on a single machine. It’d be much better to have a more distributed set of runners. - All our machines are X86. It it may be of value to have some ARM hosts too. Particularly to run ARM KVM. If anyone reading this wants to help with development of this infrastructure then I’d be happy to accommodate their input. I realize there are many parts explained that can be improved. Using the scripts I provide here: https://gem5-review.googlesource.com/c/public/gem5/+/71098 you can setup your own runner and test out different setups on your own forks of the gem5 repository. We’d also welcome improvements to our yaml scripts to better utilize what we have and run better tests. Kind regards, Bobby -- Dr. Bobby R. Bruce Room 3050, Kemper Hall, UC Davis Davis, CA, 95616 web: https://www.bobbybruce.net _______________________________________________ gem5-dev mailing list -- gem5-dev@gem5.org To unsubscribe send an email to gem5-dev-le...@gem5.org