Hi, Some of you are aware of our Jenkins nodes fell in trouble very often. We're now getting the beam4, 7 and 13 disconnected and jobs failed due to the similar reason: OOM. I am asking ASF Infra to dump the usage of resources (memory, disk, cpu...) from those machines before we reboot. In the meantime, you may see longer waiting time on jobs due to the agent reduction. We are sorry about it.
I've recorded some Jenkins console logs from those agents into a doc. Please let us know if you have insights on these problems. Any helps are appreciated. https://docs.google.com/document/d/1OBmWumaJCuHPNMHM4-V6JWhYY2ZV_ETqUdFQG5A7R4E/edit?usp=sharing Regards. Yifan
