Re: [Gluster-devel] NetBSD tests not running to completion.

Pranith Kumar Karampuri Fri, 08 Jan 2016 02:55:54 -0800


On 01/08/2016 03:25 PM, Emmanuel Dreyfus wrote:

On Fri, Jan 08, 2016 at 03:18:02PM +0530, Pranith Kumar Karampuri wrote:

Should the cleanup script needs to be manually executed on the NetBSD
machine?

You can run the script manually, but if the goal is to restore a
misbehaving machine, rebooting is probbaly the fastest way to sort
the issue.

While thinking about it, I suspect there may be some benefit
into rebooting the machine if the regression does not finish
within a sane amount of time.

Rebooting upon a single test leading to crash may not be a good idea. Weneed a reliable way of finding the need for finding that the mount hungbecause of crash and execute this cleanup script when that situationhappens. So question is can we detect this state?

First step could be to parse jenkins logs and find which test fail or hang
most often in NetBSD regression

This work is under way. I will have to change some of the scripts I wrote to
get this information.

Great.

To avoid duplication of work, did you take any tests that you are
already investigating? If not that is the first thing I will try to find out.

No, I did not started investigating right now because I have no idea where
I should look at. Your input will be very valuable.

Since we don't have the script now, I did this manually:

Here are the results for the last 15-20 runs:

Test Number of times it happened

tests/basic/afr/arbiter-statfs.t: bad status 1------- 5

tests/basic/afr/self-heal.t -------                        1
tests/basic/afr/entry-self-heal.t -------                        1
tests/basic/quota-nfs.t -------                        2



The following happened: 4 times

One example:https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/13283/consolehttps://build.gluster.org/job/rackspace-netbsd7-regression-triggered/13280/console- this seems different compared to the one above.


+ '/opt/qa/build.sh'
Build timed out (after 300 minutes). Marking the build as failed.
Build was aborted
Finished: FAILURE



The following happened: 4 times

One example:https://build.gluster.org/job/rackspace-netbsd7-regression-triggered/13279/console


ERROR: Connection was broken: java.io.IOException: Unexpected EOF

athudson.remoting.ChunkedInputStream.readUntilBreak(ChunkedInputStream.java:99)athudson.remoting.ChunkedCommandTransport.readBlock(ChunkedCommandTransport.java:39)athudson.remoting.AbstractSynchronousByteArrayCommandTransport.read(AbstractSynchronousByteArrayCommandTransport.java:34)athudson.remoting.SynchronousCommandTransport$ReaderThread.run(SynchronousCommandTransport.java:48)

I can take a look at why tests are failing (On sunday, not today :-) ).Could you look at why the timeouts/'Connection broken' stuff is happening?

Once we find out what happened. First goal is to detect and repair itautomatically. If we can't, let us write up a wiki page or something totell how to proceed when this happens.


Pranith


_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://www.gluster.org/mailman/listinfo/gluster-devel

Re: [Gluster-devel] NetBSD tests not running to completion.

Reply via email to