[JIRA] (OVIRT-296) [jenkins] take offline faulty bad slaves
[ https://ovirt-jira.atlassian.net/browse/OVIRT-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] eyal edri [Administrator] reassigned OVIRT-296: --- Assignee: Evgheni Dereveanchin (was: infra) > [jenkins] take offline faulty bad slaves > > > Key: OVIRT-296 > URL: https://ovirt-jira.atlassian.net/browse/OVIRT-296 > Project: oVirt - virtualization made easy > Issue Type: Task > Components: Jenkins >Affects Versions: Test >Reporter: eyal edri [Administrator] >Assignee: Evgheni Dereveanchin > Labels: jenkins, monitoring, > > it seems that quite often we hit an issue with a specific slave on phx, due > to various reasons (out of space/git/network/etc..). > which leads to multiple jobs trying to run on it and failing. > we need an automated way of finding this. > proposal: > add post groovy build to jobs that will take a slave offline if it's > misbehaves using: > manager.build.getBuiltOn().toComputer.setTemporarilyOffline(true) > the trick is to find such a slave and to be able to know if it failed > consistently in the past X hours to justify it's disable. > we need some sort of counter or service to track slaves and thier error state > and according to it take offline a specific slave. > for example: > if a slave was failing x jobs in Y time and runtime was < Z min , it might > indicate such a problem. > (e.g 10 jobs were failing on the same slave in a timeframe of 5 min and job > runtime was less than a 1 min.. ) > the post script should email infra@ovirt.org that it disabled a slave and we > should look into it. -- This message was sent by Atlassian JIRA (v1000.621.5#100023) ___ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra
[JIRA] (OVIRT-296) [jenkins] take offline faulty bad slaves
[ https://ovirt-jira.atlassian.net/browse/OVIRT-296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] eyal edri [Administrator] updated OVIRT-296: Priority: Medium (was: Highest) > [jenkins] take offline faulty bad slaves > > > Key: OVIRT-296 > URL: https://ovirt-jira.atlassian.net/browse/OVIRT-296 > Project: oVirt - virtualization made easy > Issue Type: Task > Components: Jenkins >Affects Versions: Test >Reporter: eyal edri [Administrator] >Assignee: infra > Labels: jenkins, monitoring, > > it seems that quite often we hit an issue with a specific slave on phx, due > to various reasons (out of space/git/network/etc..). > which leads to multiple jobs trying to run on it and failing. > we need an automated way of finding this. > proposal: > add post groovy build to jobs that will take a slave offline if it's > misbehaves using: > manager.build.getBuiltOn().toComputer.setTemporarilyOffline(true) > the trick is to find such a slave and to be able to know if it failed > consistently in the past X hours to justify it's disable. > we need some sort of counter or service to track slaves and thier error state > and according to it take offline a specific slave. > for example: > if a slave was failing x jobs in Y time and runtime was < Z min , it might > indicate such a problem. > (e.g 10 jobs were failing on the same slave in a timeframe of 5 min and job > runtime was less than a 1 min.. ) > the post script should email infra@ovirt.org that it disabled a slave and we > should look into it. -- This message was sent by Atlassian JIRA (v1000.620.0#100023) ___ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra
[JIRA] (OVIRT-296) [jenkins] take offline faulty bad slaves
Title: Message Title eyal edri [Administrator] updated an issue oVirt - virtualization made easy / OVIRT-296 [jenkins] take offline faulty bad slaves Change By: eyal edri [Administrator] Priority: Highest Add Comment This message was sent by Atlassian JIRA (v7.1.0-OD-06-005#71002-sha1:1d15c98) ___ Infra mailing list Infra@ovirt.org http://lists.ovirt.org/mailman/listinfo/infra