[issue41642] RHEL and fedora buildbots fail due to disk space error

2020-08-27 Thread STINNER Victor
STINNER Victor added the comment: The buildbot server migrated to a new machine and is now behind a load balancer. tcp/80 (buildbot web page, HTTP) and tcp/9020 (used by buildbot workers) are both behind the load balancer. Maybe the load balancer closes TCP connections which are idle for 60

[issue41642] RHEL and fedora buildbots fail due to disk space error

2020-08-27 Thread STINNER Victor
STINNER Victor added the comment: On the worker (client) side, I see many "lost remote step" every 1 to 3 minutes. Example with the PPC64LE Fedora Stable (cstratak-fedora-stable-ppc64le) worker: 2020-08-27 01:30:09-0400 [Broker,client] lost remote step 2020-08-27 01:31:57-0400

[issue41642] RHEL and fedora buildbots fail due to disk space error

2020-08-27 Thread STINNER Victor
STINNER Victor added the comment: > I have found a large number of un-removed files in /tmp. Right. I found many /tmp/cc.XXX and /tmp/tmpX files. Around 20 GB of these files! Maybe using passing "-pipe" to gcc/clang would avoid the /tmp/cc.XXX files when a build is interrupted.

[issue41642] RHEL and fedora buildbots fail due to disk space error

2020-08-27 Thread David Edelsohn
David Edelsohn added the comment: I have found a large number of un-removed files in /tmp. Things seem to function better with Buildbots running older 0.x "buildslave" as opposed to newer "builtbot-worker" instances. -- nosy: +David.Edelsohn title: Buildbot: workers detached every

[issue41642] RHEL and fedora buildbots fail due to disk space error

2020-08-26 Thread STINNER Victor
STINNER Victor added the comment: Statistics on partition which are the most full. Fedora Rawhide x86-64 is ok: /dev/mapper/vg_root_python--builder--rawhide.osci.io-root14G5,4G 7,6G 42% / /dev/mapper/vg_root_python--builder--rawhide.osci.io-home36G 24G 11G 70% /home

[issue41642] RHEL and fedora buildbots fail due to disk space error

2020-08-26 Thread STINNER Victor
STINNER Victor added the comment: python-builder-rawhide had its /tmp partition full of temporary "cc.XXX" files. Before: /tmp was full at 100% (3.9 GB). After sudo rm -f /tmp/cc*, only 52 KB are used (1%). I'm not sure why gcc/clang left so many temporary files :-/ There are many

[issue41642] RHEL and fedora buildbots fail due to disk space error

2020-08-26 Thread Charalampos Stratakis
Charalampos Stratakis added the comment: There were almost 10GB of remnant cc* files in /tmp from the compilers used, which I presume were also the temporary artifacts which remained there after the disconnects. Cleaned those up and rebooted the RHEL8 x86_64 buildbot. --

[issue41642] RHEL and fedora buildbots fail due to disk space error

2020-08-26 Thread Charalampos Stratakis
Charalampos Stratakis added the comment: There is an issue which I discovered after I returned from holidays, basically the buildbot-worker keeps getting disconnected from master, so builds start and end abruptly, retaining some artifacts. The next second it tried again with the same

[issue41642] RHEL and fedora buildbots fail due to disk space error

2020-08-26 Thread STINNER Victor
STINNER Victor added the comment: > It seems many of the RHEL and Fedora builds fail due to disk space These workers have different owners and so need to reach different people. We should list all impacted workers. > https://buildbot.python.org/all/#/builders/185/builds/2 AMD64 RHEL8 3.x

[issue41642] RHEL and fedora buildbots fail due to disk space error

2020-08-26 Thread Karthikeyan Singaravelan
New submission from Karthikeyan Singaravelan : It seems many of the RHEL and Fedora builds fail due to disk space https://buildbot.python.org/all/#/builders/185/builds/2 ./configure: line 2382: cannot create temp file for here-document: No space left on device ./configure: line 2394: cannot