[JIRA] [docker-plugin] (JENKINS-30052) Take Docker host "offline" if it is unhealthy

2015-09-11 Thread stephenconno...@java.net (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 stephenconnolly commented on  JENKINS-30052 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Take Docker host "offline" if it is unhealthy  
 
 
 
 
 
 
 
 
 
 
Aside: 
Clarification on the RFE: 
 

You add the method returning Set to Node because Node represents the configuration state and the list of required node monitors should be configurable (probably you would introduce the UI via NodeProperty so perhaps all that method would do is build the Set by iterating all NodeProperty instances of the Node and then store the resulting Set in a field (for performance) and rebuild the set on Node.save().
 

You add the instance field containing all the passing NodeMonitor checks to Computer because Computer represents the live state of the connection. Reset the instance field to null on connect/disconnect.
 

You update the Computer instance field (via a method) from AbstractNodeMonitorDescriptor for visibility and also so that the background checks will keep updating. There is great potential for deadlock so likely there would be a need to rework how that code works so that we can update a subset of node monitors for a specific computer on demand while also being able to update all the node monitors for all computers periodically. Thus you want to consolidate that threading logic in AbstractNodeMonitorDescriptor
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.4.2#64017-sha1:e244265) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more

[JIRA] [docker-plugin] (JENKINS-30052) Take Docker host "offline" if it is unhealthy

2015-09-11 Thread stephenconno...@java.net (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 stephenconnolly commented on  JENKINS-30052 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Take Docker host "offline" if it is unhealthy  
 
 
 
 
 
 
 
 
 
 
So one of the issues with {{NodeMonitor}}s is that they are designed to run in a background thread and Core does not provide an API to await their completion. 
In theory calling AbstractNodeMonitorDescriptor.monitor(Computer) would allow getting the results in a blocking mode, but that method is protected, and thus you would need either hacks (bad) or a change to core (slow) to invoke it. 
Additionally the Super Crappy Cloud API doesn't give any hooks to do this, so specific support would have to be woven into each cloud implementation... until you find out that some of the NodeMonitor implementations need to convert the Computer into a Node which means that the node has to have completed provisioning (for correctly written cloud implementations)... 
OK, so we'll dodge that rabbit hole of an approach, the cloud implementation cannot invoke the NodeMonitor before completing provisioning... perhaps then the way is to make Core handle this? 
If I were making such a change in core, I would suggest adding a method to Node or Computer that returns a Set of node monitors that must return success for the node to be available. I would add a Set instance to Computer to track the node monitors that have a successful result as well as a time. Then have either Node or Computer's isAcceptingTasks() method return false if the instance variable does not containsAll(). When testing containsAll() and the map is null if there are any required node monitors, launch a thread for that specific computer (will require a helper method in AbstractNodeMonitorDescriptor) and set the map to an empty map. Then in AbstractNodeMonitorDescriptor have it update the map with results also. 
The net effect of that is that a Cloud implementation (or even regular slaves) could select a subset node monitors that must pass before any job is accepted by overriding the method. 
The alternative to all that is to add specific code into the docker plugin that runs the required checks via the shell before even returning the node... the disadvantage is that you would have to specifically code up each check... the advantage is that you would not poison the provisioning statistics... for this latter reason I would recommend that Kanstantsin Shautsou go with the dedicated support... by all means file an RFE against core for the enhancement I described above as that would be generally useful, but it would require some changes in the provisioning stats to reduce the risk of provisioning stats poisoning 
HTH 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
  

[JIRA] [docker-plugin] (JENKINS-30052) Take Docker host "offline" if it is unhealthy

2015-09-10 Thread gentoo.inte...@gmail.com (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Kanstantsin Shautsou commented on  JENKINS-30052 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Take Docker host "offline" if it is unhealthy  
 
 
 
 
 
 
 
 
 
 
stephenconnolly could you suggest something about this async monitors? 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.4.2#64017-sha1:e244265) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] [docker-plugin] (JENKINS-30052) Take Docker host "offline" if it is unhealthy

2015-08-20 Thread gentoo.inte...@gmail.com (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Kanstantsin Shautsou commented on  JENKINS-30052 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Take Docker host "offline" if it is unhealthy  
 
 
 
 
 
 
 
 
 
 
Sorry, i mean "how it can be implemented" 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.4.2#64017-sha1:e244265) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] [docker-plugin] (JENKINS-30052) Take Docker host "offline" if it is unhealthy

2015-08-20 Thread pjdar...@java.net (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 pjdarton commented on  JENKINS-30052 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Take Docker host "offline" if it is unhealthy  
 
 
 
 
 
 
 
 
 
 
If that is true then the "force monitoring check" is failing to take the slave offline. I have had builds started on containers where "df -h" has shown the "/" partition (which contains /home/jenkins/, which is where the build slave process runs) to be 100% full. So, either the "monitoring check" does not believe that having $ {workspace} 
 on a partition that is 100% full is a problem, or it is failing to take problem slaves offline. 
For static slaves (not docker slaves), I have seen builds fail due to lack of disk space and then the slave be taken offline after a build has failed. I would guess that this "monitoring check" is done in a different thread to the one running the build and so the build can be started before the "monitoring check" has finished. This means that you get one build fail, but later builds avoid using that slave (until someone fixes the slave and brings it back online). 
What I would like is for the docker-plugin to be aware of these monitoring checks and, if a container fails the check, the plugin should avoid starting new containers of that type on that host. Ideally, the check should be run before the build starts, and the build not be started on a failing container at all, but it would be acceptable to have a single failed build as long as subsequent builds avoided the problem. 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.4.2#64017-sha1:e244265) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] [docker-plugin] (JENKINS-30052) Take Docker host "offline" if it is unhealthy

2015-08-20 Thread gentoo.inte...@gmail.com (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Kanstantsin Shautsou commented on  JENKINS-30052 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Take Docker host "offline" if it is unhealthy  
 
 
 
 
 
 
 
 
 
 
in theory it should do force monitoring check after slave connected, but not accepted build on executor. 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.4.2#64017-sha1:e244265) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] [docker-plugin] (JENKINS-30052) Take Docker host "offline" if it is unhealthy

2015-08-20 Thread pjdar...@java.net (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 pjdarton commented on  JENKINS-30052 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Take Docker host "offline" if it is unhealthy  
 
 
 
 
 
 
 
 
 
 
Would it be possible to make Jenkins check "exactly what jenkins slave.jar see" before it runs the rest of the build? (or, if not before, at least before the slave is disposed of) 
If the docker-plugin could see "exactly what jenkins slave.jar see" after it had started a container, it could then see if the container was unhealthy and take that cloud entry offline if it saw that newly-created containers on that cloud were unhealthy. It would be acceptable if this was done after a build had failed, if that could then prevent future builds from failing from the same problem - what I want is for Jenkins to not make the same mistake twice, so that if a machine runs out of space then Jenkins will run fewer builds at once instead of having all builds fail. 
(I admit that this is not an easy thing to do, but running out of disk space is a problem that I frequently encounter with Jenkins - this is a real problem when there are many builds and many users) 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.4.2#64017-sha1:e244265) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] [docker-plugin] (JENKINS-30052) Take Docker host "offline" if it is unhealthy

2015-08-20 Thread gentoo.inte...@gmail.com (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 Kanstantsin Shautsou commented on  JENKINS-30052 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
  Re: Take Docker host "offline" if it is unhealthy  
 
 
 
 
 
 
 
 
 
 
Monitor is checking exactly what jenkins slave.jar see, for cloud there is no way to predict what will be mounted in container. 
 
 
 
 
 
 
 
 
 
 
 
 

 
 Add Comment 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 
 
 

 This message was sent by Atlassian JIRA (v6.4.2#64017-sha1:e244265) 
 
 
 
 
  
 
 
 
 
 
 
 
 
   





-- 
You received this message because you are subscribed to the Google Groups "Jenkins Issues" group.
To unsubscribe from this group and stop receiving emails from it, send an email to jenkinsci-issues+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[JIRA] [docker-plugin] (JENKINS-30052) Take Docker host "offline" if it is unhealthy

2015-08-20 Thread pjdar...@java.net (JIRA)
Title: Message Title
 
 
 
 
 
 
 
 
 
 
  
 
 pjdarton created an issue 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
 
 
 
 Jenkins /  JENKINS-30052 
 
 
 
  Take Docker host "offline" if it is unhealthy  
 
 
 
 
 
 
 
 
 

Issue Type:
 
  Improvement 
 
 
 

Assignee:
 
 Kanstantsin Shautsou 
 
 
 

Components:
 

 docker-plugin 
 
 
 

Created:
 

 20/Aug/15 10:11 AM 
 
 
 

Priority:
 
  Minor 
 
 
 

Reporter:
 
 pjdarton 
 
 
 
 
 
 
 
 
 
 
Jenkins contains functionality where it monitors the "health" of all its static slaves - it periodically checks that they have enough disk space available, and (most importantly) it takes the slave "offline" if the slave has insufficient disk space available. This means that Jenkins avoids running builds on slave that have low disk space. 
I'd like similar functionality built into the docker-plugin. 
At present, the docker-plugin code counts the number of containers that a docker host is currently running, compares that to the "Container Cap" number, and refuses to start new containers on that docker host if it has reached that cap. I would like to see an additional check of that docker host's filesystem space, so that the plugin also refused to start new containers on a host that had insufficient disk space. 
This RFE is because we have has builds fail because they were run on a container that was started on a docker host that had insufficient disk space, and I'd like Jenkins to (automatically) stop using docker hosts that had insufficient space (this would then allow Jenkins to continue to run builds on other docker hosts until someone had resolved the disk space problem).