Well, we all learned the lesson. Here are my own thoughts, and some more answers are inlined: 1) Explicit separate testing for all execution engines (JET, OPT, interpreter) is really valuable; so far we found bugs in many components including classlib (!), especially due to interpreter. I bet JIT guys will soon beseech for the same in classlib testing, at least by CI :) 2) Time required to run all pre-commit tests for DRLVM is near to inadmissible. I believe this is the main reason why patch submitters may not exercise them truly. Another reason is poor diagnostic&debug services of available build infrastructure. We really need to think how we can improve it - e.g. using "sameVM" mode as much as possible, provide more reliable and convenient results reporting, more unified harness, etc. 3) It would be nice to invest some efforts to cleaning & sorting out available tests, smoke and kernel first of all. I suspect there might be some duplication in coverage, and it certainly does present between DRLVM-kernel and classlib-luni suites. Ideally we would merge them into common VM-kernel.
2006/11/16, Gregory Shimansky <[EMAIL PROTECTED]>:
Alexei Fedotov wrote: > Guys, > > This is a good discussion, and let me praise Alexey for the wonderful fix. > > I'm a bit concerned about our accepptance checks. How this could be > that regression was missed by a committer and an engineer durring > acceptance test runs? > > Bug comments showed that Gregory ran the tests before a commit. Do > tests report such problems clearly? I saw that they failed on interpreter and saw an additional failure by ClassGeneticsTest4, but when I reverted the patch I ran the test individually (because running kernel tests is quite a long task). I saw exceptions output from ClassGeneticsTest4 with reverted patch and decided that it wasn't the one to blame. I tried reverting other patches but all of them produced the same output. Today Alexey explained to me that exceptions output was just a verbose test output not to related to test passed/failed status. So the answer to your question is "not very clearly" :)
Well, junit always output total status after run, just some practice is needed to distinguish between noise and vital output :) But me too dislikes stacktraces in output very much.
After I saw that reverting the patches doesn't help very much I decided that regression somehow slipped earlier but wasn't noticed. The problem with kernel tests is that they don't pass stably. Often j.l.ThreadTest fails and j.l.RuntimeTest2 fails on XP. So I thought that negative result of kernel tests run was assumed to be ok because ClassGeneticsTest4 failure was not noticed since some tests which fail often failed.
Unfortunately yes, drlvm tests and kernel in particular gained some mournful reputation. But in fact, they (mostly) failed justly. Now DRLVM becomes more mature and we should adopt "zero regression" policy soon.
The output of kernel tests is not very good too. They run 3 times on JET, OPT and interpreter. The last output is for interpreter, and if the output prints PASSED it is necessary to check result for JET and OPT. I think the tests should either <fail/> after the first test run which didn't pass (similar to how smoke tests stop tests execution), or print a summary at the end for all 3 runs with <fail/> if case any of them didn't pass.
1) Fail-fast behaviour is not very informative, especially if instability faced. It might be useful to enforce "zero regression" rule, but I vote for "run them all" mode. BTW, this is how classlib tests behave and this is just convenient. Yet we may want to add a cmd-line switch ala "haltonfailure" in junit task. 2) I agree about more informative summary, will take care. [snip] -- Alexey