Re: Reasons for job termination

2020-07-08 Thread fcenedese
I just wanted to add my findings in case somebody else is looking for a 
solution to a similar problem.

It turned out that we have a second jenkins job running on the same 
machine, mostly unrelated to
the first job that was getting killed. The second job wants to start a 
process which can only work
if the process isn't already running. Therefore it is looking for processes 
with a certain name and kills
them if they exist. This pattern now unfortunately also matched a process 
of the first job and killed
it, assuming it was his own still running process. And as this didn't have 
anything to do with jenkins
it also didn't show up in the logs.

So it wasn't a jenkins error or resource problem but simply human error.

Thanks for any help and sorry for the noise.

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-users/f56d99d5-55d5-4178-b27b-da9cafa52bdfo%40googlegroups.com.


Re: Reasons for job termination

2020-07-03 Thread fcenedese
Thanks for the hint. That's sure something we can look into. I would have 
guessed
that a lost connection would show up in the system log but it might not. At 
least
I can try to improve the situation now.

Thanks again

On Friday, July 3, 2020 at 9:51:23 AM UTC+2, Gianluca wrote:
>
> Hi,
> what you describe seems something we experienced.
> The issue in our case was that the Jenkins agents were VMs running on an 
> overloaded host with network issues.
> A combination of network errors, agents not responding and IP exhaustion 
> made Jenkins terminating the jobs with SIGTERM when it was uncapable to 
> restore connection with the agent.
> It was hard to find because the host running the VMs was overloaded when 
> the agents were doing something so it was something like:
> agent was ok -> agent started to build a job -> job was spawning other VMs 
> for testing -> host got overloaded -> agent could run properly -> Jenkins 
> lost connection with agent -> job got terminated -> host not anymore in 
> overload -> agent ok again -> jenkins restored connection with agent.
>
>
> On Friday, 3 July 2020 08:19:22 UTC+1, fabian wrote:
>>
>> Hi 
>>
>> We've been using Jenkins for years now. Recently a problem has 
>> come up that I can't explain. Jobs started to get terminated with 
>> no apparent reason. With a signal handler I found that it's 
>> apparently the Jenkins user that is sending the SIGTERM to 
>> the running process. 
>>
>> What are reasons for Jenkins to stop a job? 
>>
>> There is no second build being started and it's throttled anyway. 
>> The build timeout plugin is installed but this is a pipeline job 
>> where it doesn't work. And I don't use the timeout options in 
>> the pipeline. 
>> I don't see anything in the jenkins log at that time. 
>>
>> How can I find out why the job is killed? 
>>
>> Thanks 
>>
>> bye  Fabi 
>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-users/e7a8753f-7aaf-4247-9639-c75d89a80deco%40googlegroups.com.


Re: Reasons for job termination

2020-07-03 Thread Gianluca
Hi,
what you describe seems something we experienced.
The issue in our case was that the Jenkins agents were VMs running on an 
overloaded host with network issues.
A combination of network errors, agents not responding and IP exhaustion 
made Jenkins terminating the jobs with SIGTERM when it was uncapable to 
restore connection with the agent.
It was hard to find because the host running the VMs was overloaded when 
the agents were doing something so it was something like:
agent was ok -> agent started to build a job -> job was spawning other VMs 
for testing -> host got overloaded -> agent could run properly -> Jenkins 
lost connection with agent -> job got terminated -> host not anymore in 
overload -> agent ok again -> jenkins restored connection with agent.


On Friday, 3 July 2020 08:19:22 UTC+1, fabian wrote:
>
> Hi 
>
> We've been using Jenkins for years now. Recently a problem has 
> come up that I can't explain. Jobs started to get terminated with 
> no apparent reason. With a signal handler I found that it's 
> apparently the Jenkins user that is sending the SIGTERM to 
> the running process. 
>
> What are reasons for Jenkins to stop a job? 
>
> There is no second build being started and it's throttled anyway. 
> The build timeout plugin is installed but this is a pipeline job 
> where it doesn't work. And I don't use the timeout options in 
> the pipeline. 
> I don't see anything in the jenkins log at that time. 
>
> How can I find out why the job is killed? 
>
> Thanks 
>
> bye  Fabi 
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Jenkins Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to jenkinsci-users+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/jenkinsci-users/813456d7-1d87-4a40-b954-ddfd6c431c86o%40googlegroups.com.