We're having a significant issue with our Jenkins env. Our Jenkins setup is 
running in Kubernetes, and sometimes the master is no longer able to 
provision new nodes, and the existing nodes don't disconnect. 

We're using 1.10.1 of Kubernets plugin, and Jenkins version is 2.121.2, and 
running in Google Cloud.

This issue first started at the end of July, and it has been growing worse 
over the weeks. Each time we think we find the cause and implement a fix, 
it just comes back to be an issue a few days later. 

- At first the issue appeared to be based on load, and was impacted by the 
slave pods not having proper CPU/memory resource requests. They all have 
proper requests now, and with better monitoring that doesn't seem to be 
related. 
- Then it seemed like maybe the jvm heap size wasn't enough, so we 
increased that and monitored those values with prometheus/grafana. And we 
don't see much relation to that as that problem either.
- If we leave Jenkins untouched and don't start any more builds, it seems 
to resolve itself of the problem over time. But in a typical working day, 
with many developers pushing code, that is not a possible solution.
- The best solution we have is to restart the Jenkins master when this 
happens.

There is one thing of interest that I want to investigate more, but I'm not 
sure how to get better logging/details when the issue is happening.

As builds are executed in the k8s pods, the Jenkins master lists the 
pod/job in the "Build Executer Status" table in the left column. Each 
"stage" of the pipeline is described as the job progresses. When the 
problem is happening, there will be 2 or more builds in this table, and 
each of them will show "part" as the stage they are in. It's unclear 
exactly what is setting that to be "part". And maybe with a better 
understanding of that, we might have a better idea of what is causing this 
issue.

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-users/4a161cd1-1412-4208-8bd3-eb4fed564ae7%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to