Ryan,

Thanks for the detailed info, it’s great to have a better idea of our buildbot 
setup!

Since we’re overcommitting on CPU, I’m wondering if it would make sense to 
reduce the vCPUs in each VM to 4? In addition to reducing any swapping, that 
might also reduce the hypervisor context-switching overhead, and improve build 
times somewhat. (It’s been awhile, but I think (?) there might be additional 
hypervisor overhead from overcommitment.)

Thoughts?

-Chris

> On 2021-05-14-F, at 00:54, Ryan Schmidt <ryandes...@macports.org> wrote:
> 
> On May 12, 2021, at 07:41, Christopher Nielsen wrote:
>> 
>> On 2021-05-12-W, at 08:32, Christopher Nielsen wrote:
>> 
>>> Looking at the build times for various ports, it varies significantly.
>>> 
>>> I was curious, are we overcommitting virtual CPUs vs. the number of 
>>> available physical cores on our Xserves? And is disk swapping coming into 
>>> play, within the VMs themselves?
> 
> For most ports, I don't believe swapping within the VM occurs. py-tensorflow 
> and other ginormous ports that exceed MacPorts expectations about compiler 
> memory use are exceptions. On the VMware ESXi side, all VMs have 100% of 
> their RAM reserved, so no swapping occurs there.
> 
> 
>> To clarify my question about overcommitment: Are the total number of virtual 
>> CPUs for the buildbot VMs running on a given Xserve, greater than the number 
>> of physical CPU cores available?
> 
> Yes we are overcommitting CPU. Each VM has 8 virtual CPUs (the maximum VMware 
> ESXi allows without a paid license) and typically 8 GB RAM (except the 10.6 
> i386 builder which only has 4 CPUs and 4 GB RAM, the maximum for 32-bit). 
> Each Xserve has 2 4-core processors, presenting as 16 hyperthreaded cores. 
> Normally we would have 3-4 VMs on each Xserve:
> 
> R (2.66GHz 32GB): 10.6i, 10.6x, 10.9, 10.15 (SSD)
> A (2.26GHz 32GB): 10.7, 10.10, 10.13 (SSD), backup (HD)
> S (2.26GHz 32GB): 10.8 (HD), 10.11 (HD), 10.14 (HD)
> M (2.26GHz 27GB): 10.12, 11x, buildmaster/files, buildmaster2 (SSD)
> 
> Server R's fan array failed two weeks ago. I turned off server A and put its 
> fan array, SSD and RAM into server R, so it now runs 7 build VMs plus the 
> backup VM:
> 
> R (2.66GHz 64GB): 10.6i, 10.6x, 10.9, 10.15 (SSD), 10.7, 10.10, 10.13 (SSD), 
> backup (HD)
> S (2.26GHz 32GB): 10.8 (HD), 10.11 (HD), 10.14 (HD)
> M (2.26GHz 27GB): 10.12, 11x, buildmaster/files, buildmaster2 (SSD)
> 
> When they're all fully busy, certainly that will be slower than when more 
> CPUs were available on the two separate servers. But this seemed to be 
> working pretty well, up until the huge batches of builds a few days ago (my 
> updates to 3 php versions worth of subports, followed by gcc updates and 
> forced builds of everything depending on gcc) which has resulted in a backlog 
> (on all servers, even those that were not consolidated).
> 
> Redistributing the VMs to make it more balanced is possible, for example:
> 
> R (2.66GHz 48GB): 10.6i, 10.6x, 10.9, 10.15 (SSD), 10.11 (HD), backup (HD)
> A (2.26GHz 48GB): 10.7, 10.10, 10.13 (SSD), 10.8 (HD), 10.14 (HD)
> M (2.26GHz 27GB): 10.12, 11x, buildmaster/files, buildmaster2 (SSD)
> 
> After seeing that using just 3 servers seemed to work, I had planned to do 
> this, but of course need to wait until the builders are idle.
> 
> Replacing the failed fan array and going back to 4 servers is possible, 
> though I do like the idea of running fewer servers (less noise, less 
> electricity).
> 
> Replacing server S's hard drives with an SSD is possible
> 
> Upgrading the CPUs in one or more servers to faster more efficient 6-core 
> Westmere models is possible.

Reply via email to