Should we create bugs for each of these, and divide-and-conquer?

- Luis

On 05/15/2014 10:27 AM, Niels de Vos wrote:
On Thu, May 15, 2014 at 06:05:00PM +0530, Vijay Bellur wrote:
On 04/30/2014 07:03 PM, Justin Clift wrote:
Hi us,

Was trying out the GlusterFS regression tests in Rackspace VMs last
night for each of the release-3.4, release-3.5, and master branches.

The regression test is just a run of "run-tests.sh", from a git
checkout of the appropriate branch.

The good news is we're adding a lot of testing code with each release:

  * release-3.4 -  6303 lines  (~30 mins to run test)
  * release-3.5 -  9776 lines  (~85 mins to run test)
  * master      - 11660 lines  (~90 mins to run test)

(lines counted using:
  $ find tests -type f -iname "*.t" -exec cat {} >> a \;; wc -l a; rm -f a)

The bad news is the tests only "kind of" pass now.  I say kind of because
although the regression run *can* pass for each of these branch's, it's
inconsistent. :(

Results from testing overnight:

  * release-3.4 - 20 runs - 17 PASS, 3 FAIL. 85% success.
    * bug-857330/normal.t failed in one run
    * bug-887098-gmount-crash.t failed in one run
    * bug-857330/normal.t failed in one run

  * release-3.5 - 20 runs, 18 PASS, 2 FAIL. 90% success.
    * bug-857330/xml.t failed in one run
    * bug-1004744.t failed in another run (same vm for both failures)

  * master - 20 runs, 6 PASS, 14 FAIL. 30% success.
    * bug-1070734.t failed in one run
    * bug-1087198.t & bug-860663.t failed in one run (same vm as bug-1070734.t 
failure above)
    * bug-1087198.t & bug-857330/normal.t failed in one run (new vm, a 
subsequent run on same vm passed)
    * bug-1087198.t & bug-948686.t failed in one run (new vm)
    * bug-1070734.t & bug-1087198.t failed in one run (new vm)
    * bug-860663.t failed in one run
    * bug-1023974.t & bug-1087198.t & bug-948686.t failed in one run (new vm)
    * bug-1004744.t & bug-1023974.t & bug-1087198.t & bug-948686.t failed in 
one run (new vm)
    * bug-948686.t failed in one run (new vm)
    * bug-1070734.t failed in one run (new vm)
    * bug-1023974.t failed in one run (new vm)
    * bug-1087198.t & bug-948686.t failed in one run (new vm)
    * bug-1070734.t failed in one run (new vm)
    * bug-1087198.t failed in one run (new vm)

The occasional failing tests aren't completely random, suggesting
something is going on.  Possible race conditions maybe? (no idea).

  * 8 failures - bug-1087198.t
  * 5 failures - bug-948686.t
  * 4 failures - bug-1070734.t
  * 3 failures - bug-1023974.t
  * 3 failures - bug-857330/normal.t
  * 2 failures - bug-860663.t
  * 2 failures - bug-1004744.t
  * 1 failures - bug-857330/xml.t
  * 1 failures - bug-887098-gmount-crash.t

Anyone have suggestions on how to make this work reliably?


I think it would be a good idea to arrive at a list of test cases that
are failing at random and assign owners to address them (default owner
being the submitter of the test case). In addition to these, I have
also seen tests like bd.t and xml.t fail pretty regularly.

Justin - can we publish a consolidated list of regression tests that
fail and owners for them on an etherpad or similar?

Fixing these test cases will enable us to bring in more jenkins
instances for parallel regression runs etc. and will also provide more
determinism for our regression tests. Your help to address the
regression test suite problems will be greatly appreciated!
Indeed, getting the regression tests stable seems like a blocker before
we can move to a scalable Jenkins solution. Unfortunately, it may not be
trivial to debug these test cases... Any suggestion on capturing useful
data that helps in figuring out why the test cases don't pass?

Thanks,
Niels
_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

_______________________________________________
Gluster-devel mailing list
Gluster-devel@gluster.org
http://supercolony.gluster.org/mailman/listinfo/gluster-devel

Reply via email to