Well, we all learned the lesson. Here are my own thoughts, and some
more answers are inlined:
1) Explicit separate testing for all execution engines (JET, OPT,
interpreter) is really valuable; so far we found bugs in many
components including classlib (!), especially due to interpreter. I
bet JIT guys will soon beseech for the same in classlib testing, at
least by CI :)
2) Time required to run all pre-commit tests for DRLVM is near to
inadmissible. I believe this is the main reason why patch submitters
may not exercise them truly. Another reason is poor diagnostic&debug
services of available build infrastructure.
We really need to think how we can improve it - e.g. using "sameVM"
mode as much as possible, provide more reliable and convenient results
reporting, more unified harness, etc.
3) It would be nice to invest some efforts to cleaning & sorting out
available tests, smoke and kernel first of all. I suspect there might
be some duplication in coverage, and it certainly does present between
DRLVM-kernel and classlib-luni suites. Ideally we would merge them
into common VM-kernel.

2006/11/16, Gregory Shimansky <[EMAIL PROTECTED]>:
Alexei Fedotov wrote:
> Guys,
>
> This is a good discussion, and let me praise Alexey for the wonderful fix.
>
> I'm a bit concerned about our accepptance checks. How this could be
> that regression was missed by a committer and an engineer durring
> acceptance test runs?
>
> Bug comments showed that Gregory ran the tests before a commit. Do
> tests report such problems clearly?

I saw that they failed on interpreter and saw an additional failure by
ClassGeneticsTest4, but when I reverted the patch I ran the test
individually (because running kernel tests is quite a long task). I saw
exceptions output from ClassGeneticsTest4 with reverted patch and
decided that it wasn't the one to blame. I tried reverting other patches
but all of them produced the same output. Today Alexey explained to me
that exceptions output was just a verbose test output not to related to
test passed/failed status. So the answer to your question is "not very
clearly" :)
Well, junit always output total status after run, just some practice
is needed to distinguish between noise and vital output :) But me too
dislikes stacktraces in output very much.

After I saw that reverting the patches doesn't help very much I decided
that regression somehow slipped earlier but wasn't noticed. The problem
with kernel tests is that they don't pass stably. Often j.l.ThreadTest
fails and j.l.RuntimeTest2 fails on XP. So I thought that negative
result of kernel tests run was assumed to be ok because
ClassGeneticsTest4 failure was not noticed since some tests which fail
often failed.
Unfortunately yes, drlvm tests and kernel in particular gained some
mournful reputation. But in fact, they (mostly) failed justly. Now
DRLVM becomes more mature and we should adopt "zero regression" policy
soon.

The output of kernel tests is not very good too. They run 3 times on
JET, OPT and interpreter. The last output is for interpreter, and if the
output prints PASSED it is necessary to check result for JET and OPT. I
think the tests should either <fail/> after the first test run which
didn't pass (similar to how smoke tests stop tests execution), or print
a summary at the end for all 3 runs with <fail/> if case any of them
didn't pass.
1) Fail-fast behaviour is not very informative, especially if
instability faced. It might be useful to enforce "zero regression"
rule, but I vote for "run them all" mode. BTW, this is how classlib
tests behave and this is just convenient. Yet we may want to add a
cmd-line switch ala "haltonfailure" in junit task.
2) I agree about more informative summary, will take care.

[snip]

--
Alexey

Reply via email to