Hi, i have a little problem with the slave JVM randomly dying on our larger build machines. When these machines are under heavy load the following error tends to pop up somewhat randomly on the slave connection:
https://pastebin.com/raw/BaY2rJ7G The likeliness of these errors increases with the amount of builds running on the slave. If i allow only 20 executors it occurs very rarely or never while with 100 executors its quite likely for the slave to disconnect from this once all executors have a job running on them. The affected machines are dual socket systems with two 18 Core Xeon CPUs making for 36 cores and 72 threads (HT). 384GB RAM (or more) are installed of which 200GB are assigned to a ramdisk (tmpfs). This ramdisk is used for the jenkins workspace. As OS we use Debian 8 (Jessie) with the 4.9 kernel from backports. The Jenkins version is 2.55 and the installed Java version is OpenJDK 1.8.0_121. The running builds are mostly larger C projects being compiled with gcc and some latex documentation. Since it occurs only with many parallel builds running this somewhat suggests that we might be hitting some kind of limit that causes the slave process to be terminated. However there is nothing in the logs (journalctl, dmesg) hinting at that and as far i know neither the oom killer nor ulimit use SIGTERM for that purpose. The following limits are reported by `ulimit -a`: https://pastebin.com/raw/RXXWnc49 Anyone happen to have an idea what might be the cause or what else i could look at? -- You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/236d182d6e7e9aaff19b2fb0e642adc5.squirrel%40user.vexar.de. For more options, visit https://groups.google.com/d/optout.