Debugging stuck vdsm jobs

2016-05-26 Thread Nir Soffer
Hi all,

We had 2 issues causing vdsm check-patch and check-merge jobs to get stuck.

I fixed the one that caused most trouble:
https://gerrit.ovirt.org/57993

The other issue may be related to ioprocess, I fixed a related issue:
https://gerrit.ovirt.org/57473

But I have seen stuck jobs after this change, so the issue may not
be fixed yet.

If you see a stuck vdsm job - job that run more than 15 minutes, please
get me a backtrace:

1. locate the test_runner process pid:

$ ps aux | grep testrunner.py | grep -v grep
nsoffer  26297 82.6  0.9 389592 44 pts/3   R+   22:52   0:02
/usr/bin/python ../tests/testrunner.py ...

2. save a backtrace:

gdb attach 26297 --batch -ex "thread apply all py-bt" > py-bt.out

Thanks,
Nir
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


[oVirt Jenkins] ovirt-engine_master_upgrade-from-master_el7_merged - Build # 417 - Failure!

2016-05-26 Thread jenkins
Project: 
http://jenkins.ovirt.org/job/ovirt-engine_master_upgrade-from-master_el7_merged/
 
Build: 
http://jenkins.ovirt.org/job/ovirt-engine_master_upgrade-from-master_el7_merged/417/
Build Number: 417
Build Status:  Failure
Triggered By: Triggered by Gerrit: https://gerrit.ovirt.org/58064

-
Changes Since Last Success:
-
Changes for Build #417
No changes



-
Failed Tests:
-
No tests ran. 

___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


Appliance job build failure because of ovirt-3.6-epel

2016-05-26 Thread Fabian Deutsch
Hey,

the 3.6 job completes, but without an engine:

http://jenkins.ovirt.org/user/fabiand/my-views/view/appliance/job/ovirt-appliance_ovirt-3.6_build-artifacts-el7-x86_64/lastSuccessfulBuild/artifact/exported-artifacts/anaconda.log/*view*/

The problenm should also be present in any other following job until
it's fixed :)

16:01:06,691 INFO program: + yum install -y ovirt-engine
16:01:06,691 INFO program: Loaded plugins: fastestmirror
16:01:06,692 INFO program:
http://download.fedoraproject.org/pub/epel/7/x86_64/repodata/55d4bcbc6bcd8727167925d216c94c7f5217b921d892da747b84d079c5905a7b-updateinfo.xml.bz2:
[Errno 14] HTTP Error 404 - Not Found
16:01:06,693 INFO program: Trying other mirror.
16:01:06,694 INFO program: To address this issue please refer to the
below knowledge base article
16:01:06,694 INFO program:
16:01:06,695 INFO program: https://access.redhat.com/articles/1320623
16:01:06,695 INFO program:
16:01:06,695 INFO program: If above article doesn't help to resolve
this issue please create a bug on https://bugs.centos.org/
16:01:06,696 INFO program:
16:01:06,697 INFO program:
http://download.fedoraproject.org/pub/epel/7/x86_64/repodata/3abc3e70be643a17bb37e3f3e1dd057d8c6242c579412fc50de180b9882e0a99-primary.sqlite.xz:
[Errno 14] HTTP Error 404 - Not Found
16:01:06,699 INFO program: Trying other mirror.
16:01:06,699 INFO program: Determining fastest mirrors
16:01:06,700 INFO program: * base: centos-distro.cavecreek.net
16:01:06,701 INFO program: * extras: centos.host-engine.com
16:01:06,701 INFO program: * updates: mirror.n5tech.com
16:01:06,702 INFO program:
http://download.fedoraproject.org/pub/epel/7/x86_64/repodata/3abc3e70be643a17bb37e3f3e1dd057d8c6242c579412fc50de180b9882e0a99-primary.sqlite.xz:
[Errno 14] HTTP Error 404 - Not Found
16:01:06,702 INFO program: Trying other mirror.
16:01:06,702 INFO program:
http://download.fedoraproject.org/pub/epel/7/x86_64/repodata/3abc3e70be643a17bb37e3f3e1dd057d8c6242c579412fc50de180b9882e0a99-primary.sqlite.xz:
[Errno 14] HTTP Error 404 - Not Found
16:01:06,703 INFO program: Trying other mirror.
16:01:06,703 INFO program:
16:01:06,703 INFO program:
16:01:06,704 INFO program: One of the configured repositories failed
(Extra Packages for Enterprise Linux 7 - x86_64),
16:01:06,705 INFO program: and yum doesn't have enough cached data to
continue. At this point the only
16:01:06,706 INFO program: safe thing yum can do is fail. There are a
few ways to work "fix" this:
16:01:06,706 INFO program:
16:01:06,706 INFO program: 1. Contact the upstream for the repository
and get them to fix the problem.
16:01:06,707 INFO program:
16:01:06,707 INFO program: 2. Reconfigure the baseurl/etc. for the
repository, to point to a working
16:01:06,707 INFO program: upstream. This is most often useful if you
are using a newer
16:01:06,708 INFO program: distribution release than is supported by
the repository (and the
16:01:06,708 INFO program: packages for the previous distribution
release still work).
16:01:06,708 INFO program:
16:01:06,708 INFO program: 3. Disable the repository, so yum won't use
it by default. Yum will then
16:01:06,709 INFO program: just ignore the repository until you
permanently enable it again or use
16:01:06,709 INFO program: --enablerepo for temporary usage:
16:01:06,709 INFO program:
16:01:06,710 INFO program: yum-config-manager --disable ovirt-3.6-epel


- fabian

-- 
Fabian Deutsch 
RHEV Hypervisor
Red Hat
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


Re: Maintenance on the Mailing-Lists

2016-05-26 Thread Sandro Bonazzola
Yes, copy.
Il 26/Mag/2016 17:11, "Marc Dequènes (Duck)"  ha scritto:

>
> On 05/26/2016 11:57 PM, Marc Dequènes (Duck) wrote:
> > Quack,
> >
> > Changes done. Do you copy?
>
> second call for test, sorry for the noise.
>
>
>
> ___
> Infra mailing list
> Infra@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/infra
>
>
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


Re: Maintenance on the Mailing-Lists

2016-05-26 Thread David Caro
On 05/26 23:57, Marc Dequènes (Duck) wrote:
> Quack,
> 
> Changes done. Do you copy?
> 

Yep, copy (and good signature too)



> ___
> Infra mailing list
> Infra@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/infra


-- 
David Caro

Red Hat S.L.
Continuous Integration Engineer - EMEA ENG Virtualization R&D

Tel.: +420 532 294 605
Email: dc...@redhat.com
IRC: dcaro|dcaroest@{freenode|oftc|redhat}
Web: www.redhat.com
RHT Global #: 82-62605


signature.asc
Description: PGP signature
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


Re: Maintenance on the Mailing-Lists

2016-05-26 Thread Duck

On 05/26/2016 11:57 PM, Marc Dequènes (Duck) wrote:
> Quack,
> 
> Changes done. Do you copy?

second call for test, sorry for the noise.




signature.asc
Description: OpenPGP digital signature
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


Re: Maintenance on the Mailing-Lists

2016-05-26 Thread Duck
Quack,

Changes done. Do you copy?



signature.asc
Description: OpenPGP digital signature
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


[oVirt Jenkins] ovirt-engine_master_upgrade-from-4.0_el7_merged - Build # 26 - Still Failing!

2016-05-26 Thread jenkins
Project: 
http://jenkins.ovirt.org/job/ovirt-engine_master_upgrade-from-4.0_el7_merged/ 
Build: 
http://jenkins.ovirt.org/job/ovirt-engine_master_upgrade-from-4.0_el7_merged/26/
Build Number: 26
Build Status:  Still Failing
Triggered By: Triggered by Gerrit: https://gerrit.ovirt.org/58139

-
Changes Since Last Success:
-
Changes for Build #19
[Tal Nisan] core: Add tests to OvfManagerTest


Changes for Build #20
[Sandro Bonazzola] ovirt-live: add 4.0 branch

[Martin Perina] aaa-ldap: Add 1.1 branch

[Marek Libra] webadmin: Forbid cluster version change if a VM is active


Changes for Build #21
[Allon Mureinik] core: PKIResources type inference


Changes for Build #22
No changes

Changes for Build #23
[Tomas Jelinek] userportal: New VM dialog offers each VM template twice


Changes for Build #24
[Martin Sivak] Fix a policy unit db upgrade script according to oVirt style 
rules


Changes for Build #25
[Allon Mureinik] core: GetAllAttachableDisksForVmQuery branching


Changes for Build #26
[Allon Mureinik] core: GetAllAttachableDisksForVmQuery's DbFacade




-
Failed Tests:
-
No tests ran. 

___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


[oVirt Jenkins] ovirt-engine_master_upgrade-from-4.0_el7_merged - Build # 25 - Still Failing!

2016-05-26 Thread jenkins
Project: 
http://jenkins.ovirt.org/job/ovirt-engine_master_upgrade-from-4.0_el7_merged/ 
Build: 
http://jenkins.ovirt.org/job/ovirt-engine_master_upgrade-from-4.0_el7_merged/25/
Build Number: 25
Build Status:  Still Failing
Triggered By: Triggered by Gerrit: https://gerrit.ovirt.org/58138

-
Changes Since Last Success:
-
Changes for Build #19
[Tal Nisan] core: Add tests to OvfManagerTest


Changes for Build #20
[Sandro Bonazzola] ovirt-live: add 4.0 branch

[Martin Perina] aaa-ldap: Add 1.1 branch

[Marek Libra] webadmin: Forbid cluster version change if a VM is active


Changes for Build #21
[Allon Mureinik] core: PKIResources type inference


Changes for Build #22
No changes

Changes for Build #23
[Tomas Jelinek] userportal: New VM dialog offers each VM template twice


Changes for Build #24
[Martin Sivak] Fix a policy unit db upgrade script according to oVirt style 
rules


Changes for Build #25
[Allon Mureinik] core: GetAllAttachableDisksForVmQuery branching




-
Failed Tests:
-
No tests ran. 

___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


[oVirt Jenkins] ovirt-engine_master_upgrade-from-4.0_el7_merged - Build # 24 - Still Failing!

2016-05-26 Thread jenkins
Project: 
http://jenkins.ovirt.org/job/ovirt-engine_master_upgrade-from-4.0_el7_merged/ 
Build: 
http://jenkins.ovirt.org/job/ovirt-engine_master_upgrade-from-4.0_el7_merged/24/
Build Number: 24
Build Status:  Still Failing
Triggered By: Triggered by Gerrit: https://gerrit.ovirt.org/51361

-
Changes Since Last Success:
-
Changes for Build #19
[Tal Nisan] core: Add tests to OvfManagerTest


Changes for Build #20
[Sandro Bonazzola] ovirt-live: add 4.0 branch

[Martin Perina] aaa-ldap: Add 1.1 branch

[Marek Libra] webadmin: Forbid cluster version change if a VM is active


Changes for Build #21
[Allon Mureinik] core: PKIResources type inference


Changes for Build #22
No changes

Changes for Build #23
[Tomas Jelinek] userportal: New VM dialog offers each VM template twice


Changes for Build #24
[Martin Sivak] Fix a policy unit db upgrade script according to oVirt style 
rules




-
Failed Tests:
-
No tests ran. 

___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


[oVirt Jenkins] ovirt-engine_master_upgrade-from-4.0_el7_merged - Build # 23 - Still Failing!

2016-05-26 Thread jenkins
Project: 
http://jenkins.ovirt.org/job/ovirt-engine_master_upgrade-from-4.0_el7_merged/ 
Build: 
http://jenkins.ovirt.org/job/ovirt-engine_master_upgrade-from-4.0_el7_merged/23/
Build Number: 23
Build Status:  Still Failing
Triggered By: Triggered by Gerrit: https://gerrit.ovirt.org/58076

-
Changes Since Last Success:
-
Changes for Build #19
[Tal Nisan] core: Add tests to OvfManagerTest


Changes for Build #20
[Sandro Bonazzola] ovirt-live: add 4.0 branch

[Martin Perina] aaa-ldap: Add 1.1 branch

[Marek Libra] webadmin: Forbid cluster version change if a VM is active


Changes for Build #21
[Allon Mureinik] core: PKIResources type inference


Changes for Build #22
No changes

Changes for Build #23
[Tomas Jelinek] userportal: New VM dialog offers each VM template twice




-
Failed Tests:
-
No tests ran. 

___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


[oVirt Jenkins] ovirt-engine_master_upgrade-from-master_el7_merged - Build # 413 - Still Failing!

2016-05-26 Thread jenkins
Project: 
http://jenkins.ovirt.org/job/ovirt-engine_master_upgrade-from-master_el7_merged/
 
Build: 
http://jenkins.ovirt.org/job/ovirt-engine_master_upgrade-from-master_el7_merged/413/
Build Number: 413
Build Status:  Still Failing
Triggered By: Triggered by Gerrit: https://gerrit.ovirt.org/58076

-
Changes Since Last Success:
-
Changes for Build #412
No changes

Changes for Build #413
No changes



-
Failed Tests:
-
No tests ran. 

___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


[oVirt Jenkins] ovirt-engine_master_upgrade-from-4.0_el7_merged - Build # 22 - Still Failing!

2016-05-26 Thread jenkins
Project: 
http://jenkins.ovirt.org/job/ovirt-engine_master_upgrade-from-4.0_el7_merged/ 
Build: 
http://jenkins.ovirt.org/job/ovirt-engine_master_upgrade-from-4.0_el7_merged/22/
Build Number: 22
Build Status:  Still Failing
Triggered By: Triggered by Gerrit: https://gerrit.ovirt.org/58074

-
Changes Since Last Success:
-
Changes for Build #19
[Tal Nisan] core: Add tests to OvfManagerTest


Changes for Build #20
[Sandro Bonazzola] ovirt-live: add 4.0 branch

[Martin Perina] aaa-ldap: Add 1.1 branch

[Marek Libra] webadmin: Forbid cluster version change if a VM is active


Changes for Build #21
[Allon Mureinik] core: PKIResources type inference


Changes for Build #22
No changes



-
Failed Tests:
-
No tests ran. 

___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


[oVirt Jenkins] ovirt-engine_master_upgrade-from-master_el7_merged - Build # 412 - Failure!

2016-05-26 Thread jenkins
Project: 
http://jenkins.ovirt.org/job/ovirt-engine_master_upgrade-from-master_el7_merged/
 
Build: 
http://jenkins.ovirt.org/job/ovirt-engine_master_upgrade-from-master_el7_merged/412/
Build Number: 412
Build Status:  Failure
Triggered By: Triggered by Gerrit: https://gerrit.ovirt.org/58074

-
Changes Since Last Success:
-
Changes for Build #412
No changes



-
Failed Tests:
-
No tests ran. 

___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


[oVirt Jenkins] ovirt-engine_master_upgrade-from-4.0_el7_merged - Build # 21 - Still Failing!

2016-05-26 Thread jenkins
Project: 
http://jenkins.ovirt.org/job/ovirt-engine_master_upgrade-from-4.0_el7_merged/ 
Build: 
http://jenkins.ovirt.org/job/ovirt-engine_master_upgrade-from-4.0_el7_merged/21/
Build Number: 21
Build Status:  Still Failing
Triggered By: Triggered by Gerrit: https://gerrit.ovirt.org/58100

-
Changes Since Last Success:
-
Changes for Build #19
[Tal Nisan] core: Add tests to OvfManagerTest


Changes for Build #20
[Sandro Bonazzola] ovirt-live: add 4.0 branch

[Martin Perina] aaa-ldap: Add 1.1 branch

[Marek Libra] webadmin: Forbid cluster version change if a VM is active


Changes for Build #21
[Allon Mureinik] core: PKIResources type inference




-
Failed Tests:
-
No tests ran. 

___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


Re: IPv6 RR disabled on lists.ovirt.org -- WHY???

2016-05-26 Thread Anton Marchukov
Hello All.

Based on the forwarded message and DNS checks I did everything should be
fine. The only thing is that have CNAME for mail server may not be a good
idea and it is better to make lists to be A and  direct records,
reverses are already fine. As I see that's the plan so looks good. When we
change we can check headers of the nearest message after DSN propagation
and google and see if it treats it as SPF pass.

Anton.

On Wed, May 25, 2016 at 10:15 AM, Marc Dequènes (Duck) 
wrote:

> Quack,
>
> Thanks dneary for coming to this thread.
>
> On 05/25/2016 04:24 AM, Karsten Wade wrote:
>
> > How about experimenting and see what happens (SCIENCE!), maybe with a
> > warning to the two main lists (devel, users) in case anything breaks?
>
> I'm in favor of experimenting too.
>
> I see no reason not to have IPv6 working on the machines' services after
> a look at the configurations. The IPv6 reverse is good, we/I only have
> to re-add the direct  and reenable Postfix IPv6, and watch :-).
>
> I will to that tomorrow unless someone raise concerns.
>
> Regards.
>
>
> ___
> Infra mailing list
> Infra@ovirt.org
> http://lists.ovirt.org/mailman/listinfo/infra
>
>


-- 
Anton Marchukov
Senior Software Engineer - RHEV CI - Red Hat
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


[oVirt Jenkins] ovirt-engine_master_upgrade-from-4.0_el7_merged - Build # 20 - Still Failing!

2016-05-26 Thread jenkins
Project: 
http://jenkins.ovirt.org/job/ovirt-engine_master_upgrade-from-4.0_el7_merged/ 
Build: 
http://jenkins.ovirt.org/job/ovirt-engine_master_upgrade-from-4.0_el7_merged/20/
Build Number: 20
Build Status:  Still Failing
Triggered By: Triggered by Gerrit: https://gerrit.ovirt.org/57799

-
Changes Since Last Success:
-
Changes for Build #19
[Tal Nisan] core: Add tests to OvfManagerTest


Changes for Build #20
[Sandro Bonazzola] ovirt-live: add 4.0 branch

[Martin Perina] aaa-ldap: Add 1.1 branch

[Marek Libra] webadmin: Forbid cluster version change if a VM is active




-
Failed Tests:
-
No tests ran. 

___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


Re: ngn build jobs take more than twice (x) as long as in the last days

2016-05-26 Thread David Caro
On 05/26 10:20, Barak Korren wrote:
> >
> >
> > I agree a stable distributed storage solution is the way to go if we can
> > find one :)
> >
> 
> Distributed storages usually suffer from a large overhead because:
> 1. They try to be resilient to node failure, which means keeping two
> or more copies of the same file, which results in I/O overhead.
> 2. They need to coordinate metadata access for large amounts of files.
> Bottlenecks in the metadata management system are a common issue for
> distributes FS storages.
> 
> Since most of our data is ephemeral anyway I don't think we need to
> pay this overhead.

The solution for our current temporary ephemeral data would be for each node
to create the vms locally, that's the scratch disks solution we started with.

The distributed storage would be used to store the jenkins machines templates,
that mostly would be read by the hosts, and thus, properly cached locally with
a low miss rate (as they don't usually change). To actually not use at all the
central storage, whose extra levels of redundancy are only useful for more
critical data (aka production datacenter machines).

> 
> 
> -- 
> Barak Korren
> bkor...@redhat.com
> RHEV-CI Team

-- 
David Caro

Red Hat S.L.
Continuous Integration Engineer - EMEA ENG Virtualization R&D

Tel.: +420 532 294 605
Email: dc...@redhat.com
IRC: dcaro|dcaroest@{freenode|oftc|redhat}
Web: www.redhat.com
RHT Global #: 82-62605


signature.asc
Description: PGP signature
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


Re: Maintenance on the Mailing-Lists

2016-05-26 Thread Duck

On 05/26/2016 03:41 PM, Sandro Bonazzola wrote:

> Note that we are on #ovirt@oftc :-)

Hum :-/. Well I guess they can auto-correct.

So I've asked IT about the DNS change.

Also I asked for the 'lists' entry to be made A/, as I remember
having myself problems with CNAMEs a long time ago, and without even SPF
involved. We could also point the MX to 'lists' instead of 'linode1'
later. We should not use the 'linode1' name for external services at all.

I'll send a message when it is done and Postfix's config is changed.
I'll ping people on IRC to see if you receive it well.

Regards.



signature.asc
Description: OpenPGP digital signature
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra


Re: ngn build jobs take more than twice (x) as long as in the last days

2016-05-26 Thread Barak Korren
>
>
> I agree a stable distributed storage solution is the way to go if we can
> find one :)
>

Distributed storages usually suffer from a large overhead because:
1. They try to be resilient to node failure, which means keeping two
or more copies of the same file, which results in I/O overhead.
2. They need to coordinate metadata access for large amounts of files.
Bottlenecks in the metadata management system are a common issue for
distributes FS storages.

Since most of our data is ephemeral anyway I don't think we need to
pay this overhead.


-- 
Barak Korren
bkor...@redhat.com
RHEV-CI Team
___
Infra mailing list
Infra@ovirt.org
http://lists.ovirt.org/mailman/listinfo/infra