STINNER Victor <vstin...@python.org> added the comment:
Charris, Pablo and me identified that TCP connections are closed by the load balancer on some buildbot workers. When the "buildbot.python.org" host name is used, TCP connections (tcp port 9020) go through a load balancer. Ernest exposed the TCP port 9020 directly to the Internet (without the load balancer) using a new host name: "buildbot-api.python.org". Buildbot workers should be updated to use "buildbot-api.python.org". I also suggest to use a keepalive of 60 seconds, rather than 600 seconds. If your worker got impacted the this issue, I strongly advice you to clean up manually the temporary directory (/tmp). When a worker was disconnected, the build was interrupted without removing temporary files. On some workers, we got around 20 GB of temporary files in /tmp: "ccXXXX" files and "tmpXXXX" files. I guess that some files are coming from the compiler, some other from the Python test suite. I updated the buildbot client configuration of the 9 workers operated by Red Hat: Fedora Rawhide x64-86 Fedora Stable x64-86 RHEL8 x64-86 RHEL7 x64-86 RHEL8 FIPS x86-64 Fedora Rawhide AArch64 Fedora Stable AArch64 RHEL 8 ppc64le RHEL 7 ppc64le On our owners, I used the following commands: systemctl stop buildbot-worker.service du -sh /tmp; rm -f /tmp/{cc,tmp}*; du -sh /tmp sed -i -e "s/buildmaster_host = 'buildbot.python.org'/buildmaster_host = 'buildbot-api.python.org'/;s/keepalive = .*/keepalive = 60/" /home/buildbot/buildarea/buildbot.tac; grep -E '(host|keepalive) =' /home/buildbot/buildarea/buildbot.tac systemctl start buildbot-worker.service systemctl status buildbot-worker.service ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue41642> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com