Re: Reasons for job termination
I just wanted to add my findings in case somebody else is looking for a solution to a similar problem. It turned out that we have a second jenkins job running on the same machine, mostly unrelated to the first job that was getting killed. The second job wants to start a process which can only work if the process isn't already running. Therefore it is looking for processes with a certain name and kills them if they exist. This pattern now unfortunately also matched a process of the first job and killed it, assuming it was his own still running process. And as this didn't have anything to do with jenkins it also didn't show up in the logs. So it wasn't a jenkins error or resource problem but simply human error. Thanks for any help and sorry for the noise. -- You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/f56d99d5-55d5-4178-b27b-da9cafa52bdfo%40googlegroups.com.
Re: Reasons for job termination
Thanks for the hint. That's sure something we can look into. I would have guessed that a lost connection would show up in the system log but it might not. At least I can try to improve the situation now. Thanks again On Friday, July 3, 2020 at 9:51:23 AM UTC+2, Gianluca wrote: > > Hi, > what you describe seems something we experienced. > The issue in our case was that the Jenkins agents were VMs running on an > overloaded host with network issues. > A combination of network errors, agents not responding and IP exhaustion > made Jenkins terminating the jobs with SIGTERM when it was uncapable to > restore connection with the agent. > It was hard to find because the host running the VMs was overloaded when > the agents were doing something so it was something like: > agent was ok -> agent started to build a job -> job was spawning other VMs > for testing -> host got overloaded -> agent could run properly -> Jenkins > lost connection with agent -> job got terminated -> host not anymore in > overload -> agent ok again -> jenkins restored connection with agent. > > > On Friday, 3 July 2020 08:19:22 UTC+1, fabian wrote: >> >> Hi >> >> We've been using Jenkins for years now. Recently a problem has >> come up that I can't explain. Jobs started to get terminated with >> no apparent reason. With a signal handler I found that it's >> apparently the Jenkins user that is sending the SIGTERM to >> the running process. >> >> What are reasons for Jenkins to stop a job? >> >> There is no second build being started and it's throttled anyway. >> The build timeout plugin is installed but this is a pipeline job >> where it doesn't work. And I don't use the timeout options in >> the pipeline. >> I don't see anything in the jenkins log at that time. >> >> How can I find out why the job is killed? >> >> Thanks >> >> bye Fabi >> >> -- You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/e7a8753f-7aaf-4247-9639-c75d89a80deco%40googlegroups.com.
Re: Reasons for job termination
Hi, what you describe seems something we experienced. The issue in our case was that the Jenkins agents were VMs running on an overloaded host with network issues. A combination of network errors, agents not responding and IP exhaustion made Jenkins terminating the jobs with SIGTERM when it was uncapable to restore connection with the agent. It was hard to find because the host running the VMs was overloaded when the agents were doing something so it was something like: agent was ok -> agent started to build a job -> job was spawning other VMs for testing -> host got overloaded -> agent could run properly -> Jenkins lost connection with agent -> job got terminated -> host not anymore in overload -> agent ok again -> jenkins restored connection with agent. On Friday, 3 July 2020 08:19:22 UTC+1, fabian wrote: > > Hi > > We've been using Jenkins for years now. Recently a problem has > come up that I can't explain. Jobs started to get terminated with > no apparent reason. With a signal handler I found that it's > apparently the Jenkins user that is sending the SIGTERM to > the running process. > > What are reasons for Jenkins to stop a job? > > There is no second build being started and it's throttled anyway. > The build timeout plugin is installed but this is a pipeline job > where it doesn't work. And I don't use the timeout options in > the pipeline. > I don't see anything in the jenkins log at that time. > > How can I find out why the job is killed? > > Thanks > > bye Fabi > > -- You received this message because you are subscribed to the Google Groups "Jenkins Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-users+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/jenkinsci-users/813456d7-1d87-4a40-b954-ddfd6c431c86o%40googlegroups.com.