[Gluster-infra] NetBSD regression fixes

2016-01-16 Thread Emmanuel Dreyfus
Hello all

Here are the problems identified in NetBSD regression so far:

1) Before starting regression, slave compains about "vnconfig:
VNDIOCGET: Bad file descriptor" and fails the run.

This will be fixed by that changes:
http://review.gluster.org/13204
http://review.gluster.org/13205


2) Spurious failures
I added a retry-failed-test-once feature so that we get less regression
failures because of spurious failures. It is not used right now because
it does not play nicely with bad tests blacklist.

This will be fixed by that changes:
http://review.gluster.org/13245
http://review.gluster.org/13247

I have been looping failure-free regression for a while with that trick.


3) Stale state from previous regression
We sometime have processes stuck from previous regression, awaiting
vnode locks for destroyed NFS filesystems. This cause starting cleanup
scripts to hang before starting regression and we get a timeout.

I modified slave's /opt/qa/regression.sh to check for stuck processes
and reboot the system if we find them. That will fail the current
regression run, but at least the next ones coming after reboot will be
safe.

This fix is not deployed yet, I await the fixes from point 2 to be
merged


4) Jenkins casts concurent runs on the same slave
We observed Jenkins sometimes runs two jobs on the same slave at once,
which of course can only lead to horrible failure.

I modified slave's /opt/qa/regression.sh to add a lock file so that this
situation is detected early and reported. The second regression will
fail, but the idea is to get a better understanding of how that can
occur.

This fix is not deployed yet, I await the fixes from point 2 to be
merged

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra


Re: [Gluster-infra] NetBSD regression fixes

2016-01-16 Thread Emmanuel Dreyfus
Emmanuel Dreyfus  wrote:

> But I just realized the change is wrong, since running tests "new way"
> stops on first failed test. My change just retry the failed test and
> considers the regression run to be good on success, without running next
> tests.
> 
> I will post an update shortly.

Done:
http://review.gluster.org/13245
http://review.gluster.org/13247
-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org
___
Gluster-infra mailing list
Gluster-infra@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-infra