[JIRA] (OVIRT-1840) jobs freeze due to unresponsive docker

2018-12-15 Thread Eyal Edri (oVirt JIRA)

[ 
https://ovirt-jira.atlassian.net/browse/OVIRT-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=38579#comment-38579
 ] 

Eyal Edri commented on OVIRT-1840:
--

Is it still happening? 

> jobs freeze due to unresponsive docker
> --
>
> Key: OVIRT-1840
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1840
> Project: oVirt - virtualization made easy
>  Issue Type: Task
>Reporter: Evgheni Dereveanchin
>Assignee: infra
>
> Quite often do I see jobs stuck at various stages for hours that seem related 
> to docker.
> Example:
> http://jenkins.ovirt.org/job/ovirt-engine_master_build-artifacts-fc26-x86_64/610/console
> There's multiple docker commands stuck on the slave (will post in the next 
> comment) so it seems to be deadlocked. Opening ticket to investigate which 
> step exactly is causing this and possible ways of resolving. The job in 
> question doesn't even use docker so shouldn't suffer if this happens.



--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100095)
___
Infra mailing list -- infra@ovirt.org
To unsubscribe send an email to infra-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/infra@ovirt.org/message/5DMY47AZ52VH7SWKRUJ6ZTW7YPSWZQIP/


[JIRA] (OVIRT-1840) jobs freeze due to unresponsive docker

2019-07-04 Thread Eyal Edri (oVirt JIRA)

 [ 
https://ovirt-jira.atlassian.net/browse/OVIRT-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Eyal Edri updated OVIRT-1840:
-
Resolution: Cannot Reproduce
Status: Done  (was: To Do)

> jobs freeze due to unresponsive docker
> --
>
> Key: OVIRT-1840
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1840
> Project: oVirt - virtualization made easy
>  Issue Type: Task
>Reporter: Evgheni Dereveanchin
>Assignee: infra
>
> Quite often do I see jobs stuck at various stages for hours that seem related 
> to docker.
> Example:
> http://jenkins.ovirt.org/job/ovirt-engine_master_build-artifacts-fc26-x86_64/610/console
> There's multiple docker commands stuck on the slave (will post in the next 
> comment) so it seems to be deadlocked. Opening ticket to investigate which 
> step exactly is causing this and possible ways of resolving. The job in 
> question doesn't even use docker so shouldn't suffer if this happens.



--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100105)
___
Infra mailing list -- infra@ovirt.org
To unsubscribe send an email to infra-le...@ovirt.org
Privacy Statement: https://www.ovirt.org/site/privacy-policy/
oVirt Code of Conduct: 
https://www.ovirt.org/community/about/community-guidelines/
List Archives: 
https://lists.ovirt.org/archives/list/infra@ovirt.org/message/T56WD3OL3N5ETLT42B57OEWDAZHZ7LNN/


[JIRA] (OVIRT-1840) jobs freeze due to unresponsive docker

2018-01-11 Thread Evgheni Dereveanchin (oVirt JIRA)
Evgheni Dereveanchin created OVIRT-1840:
---

 Summary: jobs freeze due to unresponsive docker
 Key: OVIRT-1840
 URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1840
 Project: oVirt - virtualization made easy
  Issue Type: Task
Reporter: Evgheni Dereveanchin
Assignee: infra


Quite often do I see jobs stuck at various stages for hours that seem related 
to docker.
Example:
http://jenkins.ovirt.org/job/ovirt-engine_master_build-artifacts-fc26-x86_64/610/console

There's multiple docker commands stuck on the slave (will post in the next 
comment) so it seems to be deadlocked. Opening ticket to investigate which step 
exactly is causing this and possible ways of resolving. The job in question 
doesn't even use docker so shouldn't suffer if this happens.




--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100075)
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


[JIRA] (OVIRT-1840) jobs freeze due to unresponsive docker

2018-01-11 Thread Evgheni Dereveanchin (oVirt JIRA)

[ 
https://ovirt-jira.atlassian.net/browse/OVIRT-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=35637#comment-35637
 ] 

Evgheni Dereveanchin commented on OVIRT-1840:
-

Looking at the slave, here's the stuck part:
   `-bash -c cd "/home/jenkins" && java  -jar slave.jar
   `-java -jar slave.jar
   |-bash -ex /tmp/jenkins6289914333579712645.sh
   |   `-bash -ex /tmp/jenkins6289914333579712645.sh
   |   |-grep -oP .+?(?=:exported-artifacts)
   |   `-sudo -n docker images --format={{.Repository}}:{{.Tag}}
   |   `-docker-current images 
--format={{.Repository}}:{{.Tag}}
   |   `-5*[{docker-current}]


At the same time, I see the following in pstree output of the same node:
-sudo -n docker images --format={{.Repository}}:{{.Tag}}
   `-docker-current images --format={{.Repository}}:{{.Tag}}
   `-5*[{docker-current}]
-sudo systemctl start docker
   `-systemctl start docker
-sudo -n /bin/yum install -y docker
   `-yum /bin/yum install -y docker
   `-sh /var/tmp/rpm-tmp.gO7ceb 1
   `-systemctl try-restart docker.service
-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm 
$DEAD; exit 0
   `-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm 
$DEAD; exit 0
   `-docker-current ps -aq -f status=dead
   `-6*[{docker-current}]

As all of these commands are stuck from various stages of the job while docker 
wasn't even used throughout it.

> jobs freeze due to unresponsive docker
> --
>
> Key: OVIRT-1840
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1840
> Project: oVirt - virtualization made easy
>  Issue Type: Task
>Reporter: Evgheni Dereveanchin
>Assignee: infra
>
> Quite often do I see jobs stuck at various stages for hours that seem related 
> to docker.
> Example:
> http://jenkins.ovirt.org/job/ovirt-engine_master_build-artifacts-fc26-x86_64/610/console
> There's multiple docker commands stuck on the slave (will post in the next 
> comment) so it seems to be deadlocked. Opening ticket to investigate which 
> step exactly is causing this and possible ways of resolving. The job in 
> question doesn't even use docker so shouldn't suffer if this happens.



--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100075)
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


[JIRA] (OVIRT-1840) jobs freeze due to unresponsive docker

2018-01-11 Thread Evgheni Dereveanchin (oVirt JIRA)

[ 
https://ovirt-jira.atlassian.net/browse/OVIRT-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=35637#comment-35637
 ] 

Evgheni Dereveanchin edited comment on OVIRT-1840 at 1/11/18 8:51 AM:
--

Looking at the slave, here's the stuck part:
{quote}   `-bash -c cd "/home/jenkins" && java  -jar slave.jar{quote}
{quote}   `-java -jar slave.jar{quote}
{quote}   |-bash -ex /tmp/jenkins6289914333579712645.sh{quote}
{quote}   |   `-bash -ex /tmp/jenkins6289914333579712645.sh{quote}
{quote}   |   |-grep -oP .+?(?=:exported-artifacts){quote}
{quote}   |   `-sudo -n docker images 
--format={{.Repository}}:{{.Tag}}{quote}
{quote}   |   `-docker-current images 
--format={{.Repository}}:{{.Tag}}{quote}
{quote}   |   `-5*[{docker-current}]{quote}


At the same time, I see the following in pstree output of the same node:{quote}
{quote}-sudo -n docker images --format={{.Repository}}:{{.Tag}}{quote}
{quote}   `-docker-current images --format={{.Repository}}:{{.Tag}}{quote}
{quote}   `-5*[{docker-current}]{quote}
{quote}-sudo systemctl start docker{quote}
{quote}   `-systemctl start docker{quote}
{quote}-sudo -n /bin/yum install -y docker{quote}
{quote}   `-yum /bin/yum install -y docker{quote}
{quote}   `-sh /var/tmp/rpm-tmp.gO7ceb 1{quote}
{quote}   `-systemctl try-restart docker.service{quote}
{quote}-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker 
rm $DEAD; exit 0{quote}
{quote}   `-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && 
docker rm $DEAD; exit 0{quote}
{quote}   `-docker-current ps -aq -f status=dead{quote}
{quote}   `-6*[{docker-current}]{quote}

As all of these commands are stuck from various stages of the job while docker 
wasn't even used throughout it.


was (Author: ederevea):
Looking at the slave, here's the stuck part:
   `-bash -c cd "/home/jenkins" && java  -jar slave.jar
   `-java -jar slave.jar
   |-bash -ex /tmp/jenkins6289914333579712645.sh
   |   `-bash -ex /tmp/jenkins6289914333579712645.sh
   |   |-grep -oP .+?(?=:exported-artifacts)
   |   `-sudo -n docker images --format={{.Repository}}:{{.Tag}}
   |   `-docker-current images 
--format={{.Repository}}:{{.Tag}}
   |   `-5*[{docker-current}]


At the same time, I see the following in pstree output of the same node:
-sudo -n docker images --format={{.Repository}}:{{.Tag}}
   `-docker-current images --format={{.Repository}}:{{.Tag}}
   `-5*[{docker-current}]
-sudo systemctl start docker
   `-systemctl start docker
-sudo -n /bin/yum install -y docker
   `-yum /bin/yum install -y docker
   `-sh /var/tmp/rpm-tmp.gO7ceb 1
   `-systemctl try-restart docker.service
-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm 
$DEAD; exit 0
   `-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm 
$DEAD; exit 0
   `-docker-current ps -aq -f status=dead
   `-6*[{docker-current}]

As all of these commands are stuck from various stages of the job while docker 
wasn't even used throughout it.

> jobs freeze due to unresponsive docker
> --
>
> Key: OVIRT-1840
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1840
> Project: oVirt - virtualization made easy
>  Issue Type: Task
>Reporter: Evgheni Dereveanchin
>Assignee: infra
>
> Quite often do I see jobs stuck at various stages for hours that seem related 
> to docker.
> Example:
> http://jenkins.ovirt.org/job/ovirt-engine_master_build-artifacts-fc26-x86_64/610/console
> There's multiple docker commands stuck on the slave (will post in the next 
> comment) so it seems to be deadlocked. Opening ticket to investigate which 
> step exactly is causing this and possible ways of resolving. The job in 
> question doesn't even use docker so shouldn't suffer if this happens.



--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100075)
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


[JIRA] (OVIRT-1840) jobs freeze due to unresponsive docker

2018-01-11 Thread Evgheni Dereveanchin (oVirt JIRA)

[ 
https://ovirt-jira.atlassian.net/browse/OVIRT-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=35637#comment-35637
 ] 

Evgheni Dereveanchin edited comment on OVIRT-1840 at 1/11/18 8:54 AM:
--

Looking at the slave, here's the stuck part:
{{   `-bash -c cd "/home/jenkins" && java  -jar slave.jar}}
{{   `-java -jar slave.jar}}
{{   |-bash -ex /tmp/jenkins6289914333579712645.sh}}
{{   |   `-bash -ex /tmp/jenkins6289914333579712645.sh}}
{{   |   |-grep -oP .+?(?=:exported-artifacts)}}
{{   |   `-sudo -n docker images 
--format={{.Repository}}:{{.Tag}} }}
{{   |   `-docker-current images 
--format={{.Repository}}:{{.Tag}} }}
{{   |   `-5*[{docker-current}]}}


At the same time, I see the following in pstree output of the same node:}}
{{-sudo -n docker images --format={{.Repository}}:{{.Tag}} }}
{{   `-docker-current images --format={{.Repository}}:{{.Tag}} }}
{{   `-5*[{docker-current}]}}
{{-sudo systemctl start docker}}
{{   `-systemctl start docker}}
{{-sudo -n /bin/yum install -y docker}}
{{   `-yum /bin/yum install -y docker}}
{{   `-sh /var/tmp/rpm-tmp.gO7ceb 1}}
{{   `-systemctl try-restart docker.service}}
{{-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm 
$DEAD; exit 0}}
{{   `-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker rm 
$DEAD; exit 0}}
{{   `-docker-current ps -aq -f status=dead}}
{{   `-6*[{docker-current}]}}

As all of these commands are stuck from various stages of the job while docker 
wasn't even used throughout it.


was (Author: ederevea):
Looking at the slave, here's the stuck part:
{quote}   `-bash -c cd "/home/jenkins" && java  -jar slave.jar{quote}
{quote}   `-java -jar slave.jar{quote}
{quote}   |-bash -ex /tmp/jenkins6289914333579712645.sh{quote}
{quote}   |   `-bash -ex /tmp/jenkins6289914333579712645.sh{quote}
{quote}   |   |-grep -oP .+?(?=:exported-artifacts){quote}
{quote}   |   `-sudo -n docker images 
--format={{.Repository}}:{{.Tag}}{quote}
{quote}   |   `-docker-current images 
--format={{.Repository}}:{{.Tag}}{quote}
{quote}   |   `-5*[{docker-current}]{quote}


At the same time, I see the following in pstree output of the same node:{quote}
{quote}-sudo -n docker images --format={{.Repository}}:{{.Tag}}{quote}
{quote}   `-docker-current images --format={{.Repository}}:{{.Tag}}{quote}
{quote}   `-5*[{docker-current}]{quote}
{quote}-sudo systemctl start docker{quote}
{quote}   `-systemctl start docker{quote}
{quote}-sudo -n /bin/yum install -y docker{quote}
{quote}   `-yum /bin/yum install -y docker{quote}
{quote}   `-sh /var/tmp/rpm-tmp.gO7ceb 1{quote}
{quote}   `-systemctl try-restart docker.service{quote}
{quote}-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && docker 
rm $DEAD; exit 0{quote}
{quote}   `-sh -c DEAD=`docker ps -aq -f status=dead` && [ -n "$DEAD" ] && 
docker rm $DEAD; exit 0{quote}
{quote}   `-docker-current ps -aq -f status=dead{quote}
{quote}   `-6*[{docker-current}]{quote}

As all of these commands are stuck from various stages of the job while docker 
wasn't even used throughout it.

> jobs freeze due to unresponsive docker
> --
>
> Key: OVIRT-1840
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1840
> Project: oVirt - virtualization made easy
>  Issue Type: Task
>Reporter: Evgheni Dereveanchin
>Assignee: infra
>
> Quite often do I see jobs stuck at various stages for hours that seem related 
> to docker.
> Example:
> http://jenkins.ovirt.org/job/ovirt-engine_master_build-artifacts-fc26-x86_64/610/console
> There's multiple docker commands stuck on the slave (will post in the next 
> comment) so it seems to be deadlocked. Opening ticket to investigate which 
> step exactly is causing this and possible ways of resolving. The job in 
> question doesn't even use docker so shouldn't suffer if this happens.



--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100075)
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


[JIRA] (OVIRT-1840) jobs freeze due to unresponsive docker

2018-01-11 Thread Evgheni Dereveanchin (oVirt JIRA)

[ 
https://ovirt-jira.atlassian.net/browse/OVIRT-1840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=35638#comment-35638
 ] 

Evgheni Dereveanchin commented on OVIRT-1840:
-

Looking at timestamps of when docker was installed, seems that the slave was 
broken during this build:
http://jenkins.ovirt.org/job/vdsm_master_check-merged-el7-x86_64/3039/console

07:49:56 Running transaction
07:49:57   Updating   : container-storage-setup-0.8.0-3.git1d27ecf.el7.noarch   
1/12 
07:49:57   Updating   : 2:oci-umount-2.3.0-1.git51e7c50.el7.x86_64  
2/12 
07:50:35   Updating   : 2:container-selinux-2.33-1.git86f33cd.el7.noarch
3/12 
07:50:36   Updating   : 2:docker-common-1.12.6-68.gitec8512b.el7.centos.x86_64  
4/12 
07:50:38   Updating   : 2:docker-client-1.12.6-68.gitec8512b.el7.centos.x86_64  
5/12 
07:50:39   Updating   : 2:docker-1.12.6-68.gitec8512b.el7.centos.x86_64 
6/12 
13:49:32   Cleanup: 2:docker-1.12.6-48.git0fdc778.el7.centos.x86_64 
7/12
Build timed out (after 360 minutes). Marking the build as failed.

This explains the leftover yum processes on the system, which can block further 
yum installs or even theoretically corrupt RPMDB if docker decides to unfreeze 
for some reason.

> jobs freeze due to unresponsive docker
> --
>
> Key: OVIRT-1840
> URL: https://ovirt-jira.atlassian.net/browse/OVIRT-1840
> Project: oVirt - virtualization made easy
>  Issue Type: Task
>Reporter: Evgheni Dereveanchin
>Assignee: infra
>
> Quite often do I see jobs stuck at various stages for hours that seem related 
> to docker.
> Example:
> http://jenkins.ovirt.org/job/ovirt-engine_master_build-artifacts-fc26-x86_64/610/console
> There's multiple docker commands stuck on the slave (will post in the next 
> comment) so it seems to be deadlocked. Opening ticket to investigate which 
> step exactly is causing this and possible ways of resolving. The job in 
> question doesn't even use docker so shouldn't suffer if this happens.



--
This message was sent by Atlassian Jira
(v1001.0.0-SNAPSHOT#100075)
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra