After googling, I tried a few things: Memory has correct timing, frequencies and voltage (no improvement)
kernel parameters => no improvement - idle=nomwait - processor.max_cstate=5 - rcu_nocbs=0-11 Undervolting / Overclocking => seems to make the system a bit more stable - Reducing PPT to 45W - PBS Curve all cores: -10 - Boost limit: -300 (ending around 4Ghz) Deactivate SMT => no improvement Deactivate selective CPUs (Error always showed on CPU5) => no improvement Deactivating tx, sg, tso offloading => no improvement Overall it seems the system crashes when doing load changes, e.g. like compiling. It then takes SATA, network, etc. down, leading to an unusable system.