By default, qa.run will log at INFO level (which is what I used for the bulk run above).
What I do when I need more logging, is specify my own logging.properties file for that, based upon what is in qa1.logging, and fine tune my logging settings depending on what I am testing. in your build.properties: log.config=/home/jonathan/logging.properties for instance. 2010/8/27 Patricia Shanahan <[email protected]> > Excellent! Once the servicediscovery regression is fixed that can be added. > > Do you run your tests with logging enabled, and if so at what level? I have > a specific coverage issue involving JoinManager and RetryTask. As far as I > can tell, we are not testing what happens when a RetryTask has to do a > Retry, and I believe tasks can get out of order in undesirable ways when > that happens. If retries are being tests, at the FINEST logging level we > would see messages from RetryTask containing "retry of". > > I would like to know about any tests that produce those messages. > > Patricia > > > > > > On 8/27/2010 3:29 AM, Jonathan Costers wrote: > >> I just ran the set of tests that are currently being selected when >> executing >> the qa.run target, after I added a couple more categories: >> >> # of tests started = 497 >> # of tests completed = 497 >> # of tests skipped = 21 >> # of tests passed = 497 >> # of tests failed = 0 >> >> ----------------------------------------- >> >> Date finished: >> Fri Aug 27 12:21:04 CEST 2010 >> Time elapsed: >> 27258 seconds >> >> BUILD SUCCESSFUL (total time: 454 minutes 20 seconds) >> >> The categories that are run are: >> >> id,loader,policyprovider,locatordiscovery,activation,config,discoverymanager,joinmanager,url,iiop,jrmp,reliability,thread,renewalmanager,constraint,export,lookupdiscovery >> >> Looks like we almost have 50% coverage now (about 500 tests out of 1000+). >> >> On my system (an Intel Quad Core with 4GB of memory), this took 7-8 hours >> to >> run. >> >> 2010/8/27 Patricia Shanahan<[email protected]> >> >> That would be ideal. However, an infrequent run of a very large test set >>> can be managed manually, with check lists. >>> >>> Patricia >>> >>> >>> >>> Jonathan Costers wrote: >>> >>> The QA harness is also supposed to be able to work in distributed mode, >>>> i.e. >>>> having multiple machines work together on one test run (splitting the >>>> work >>>> so to speak). >>>> I haven't looked into that feature too much though. >>>> >>>> 2010/8/27 Patricia Shanahan<[email protected]> >>>> >>>> Based on some experiments, I am convinced a full run may take more than >>>> >>>>> 24 >>>>> hours, so even that may have to be selective. Jonathan Costers reports >>>>> killing a full run after several days. We may need three targets, in >>>>> addition to problem-specific categories: >>>>> >>>>> 1. A quick test that one would do, for example, after checking out and >>>>> building. >>>>> >>>>> 2. A more substantive test that would run in less than 24 hours, to do >>>>> each >>>>> day. >>>>> >>>>> 3. A complete test that might take several machine-days, and that would >>>>> be >>>>> run against a release candidate prior to release. >>>>> >>>>> Note that even if a test sequence takes several machine-days, that does >>>>> not >>>>> necessarily mean days of elapsed time. Maybe some tests can be run in >>>>> parallel under the same OS copy. Even if that is not possible, we may >>>>> be >>>>> able to gang up several physical or virtual machines, each running a >>>>> subset >>>>> of the tests. >>>>> >>>>> I think virtual machines may work quite well because a lot of the tests >>>>> do >>>>> something then wait around a minute or two to see what happens. They >>>>> are >>>>> not >>>>> very intensive resource users. >>>>> >>>>> Patricia >>>>> >>>>> >>>>> >>>>> Peter Firmstone wrote: >>>>> >>>>> Hi JC, >>>>> >>>>>> >>>>>> Can we have an ant target for running all the tests? >>>>>> >>>>>> And how about a qa.run.hudson target? >>>>>> >>>>>> I usually use run-categories, to isolate what I'm working on, but we >>>>>> definitely need a target that runs everything that should be, even if >>>>>> it >>>>>> does take overnight. >>>>>> >>>>>> Regards, >>>>>> >>>>>> Peter. >>>>>> >>>>>> Jonathan Costers wrote: >>>>>> >>>>>> 2010/8/24 Patricia Shanahan<[email protected]> >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> On 8/22/2010 4:57 PM, Peter Firmstone wrote: >>>>>>> >>>>>>>> ... >>>>>>>> >>>>>>>> Thanks Patricia, that's very helpful, I'll figure it out where I >>>>>>>> went >>>>>>>> >>>>>>>> >>>>>>>> wrong this week, it really shows the importance of full test >>>>>>>> >>>>>>>>> coverage. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> ... >>>>>>>>> >>>>>>>> >>>>>>>> I strongly agree that test coverage is important. Accordingly, I've >>>>>>>> done >>>>>>>> some analysis of the "ant qa.run" output. >>>>>>>> >>>>>>>> There are 1059 test description (*.td) files that exist, and are >>>>>>>> loaded >>>>>>>> at >>>>>>>> the start of "ant qa.run", but that do not seem to be run. I've >>>>>>>> extracted >>>>>>>> the top level categories from those files: >>>>>>>> >>>>>>>> constraint >>>>>>>> discoveryproviders_impl >>>>>>>> discoveryservice >>>>>>>> end2end >>>>>>>> eventmailbox >>>>>>>> export_spec >>>>>>>> io >>>>>>>> javaspace >>>>>>>> jeri >>>>>>>> joinmanager >>>>>>>> jrmp >>>>>>>> loader >>>>>>>> locatordiscovery >>>>>>>> lookupdiscovery >>>>>>>> lookupservice >>>>>>>> proxytrust >>>>>>>> reliability >>>>>>>> renewalmanager >>>>>>>> renewalservice >>>>>>>> scalability >>>>>>>> security >>>>>>>> start >>>>>>>> txnmanager >>>>>>>> >>>>>>>> I'm sure some of these tests are obsolete, duplicates of tests in >>>>>>>> categories that are being run, or otherwise inappropriate, but there >>>>>>>> does >>>>>>>> seem to be a rich vein of tests we could mine. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> The QA harness loads all .td files under the "spec" and "impl" >>>>>>>> >>>>>>> directories >>>>>>> when starting and only witholds the ones that are tagged with the >>>>>>> categories >>>>>>> that we specify from the Ant target. >>>>>>> Whenever a test is really obsolete or otherwise not supposed to run, >>>>>>> it >>>>>>> is >>>>>>> marked with a "SkipTestVerifier" in its .td file. >>>>>>> Most of these are genuine and should be run though. >>>>>>> There are more categories than the ones you mention above, for >>>>>>> instance: >>>>>>> "spec", "id", "id_spec", etc. >>>>>>> Also, some tests are tagged with multiple categories and as such >>>>>>> duplicates >>>>>>> can exist when assembling the list of tests to run. >>>>>>> >>>>>>> The reason not all of them are run (by Hudson) now is that we give a >>>>>>> specific set of test categories that are known (to me) to run >>>>>>> smoothly. >>>>>>> There are many others that are not run (by default) because issue(s) >>>>>>> are >>>>>>> present with one or more of the tests in that category. >>>>>>> >>>>>>> I completely agree with the fact that we should not exclude complete >>>>>>> test >>>>>>> categories because of one test failing. >>>>>>> What we probably should do is tag any problematic test (due to >>>>>>> infrastructure or other reasons) with a SkipTestVerifier for the time >>>>>>> being >>>>>>> so that it is not taken into account by the QA harness for now. >>>>>>> That way, we can add all test categories to the default Ant run. >>>>>>> However, this would take a large amount of time to run (I've tried it >>>>>>> once, >>>>>>> and killed the process after several days), which brings us to your >>>>>>> next >>>>>>> point: >>>>>>> >>>>>>> Part of the problem may be time to run the tests. I'd like to propose >>>>>>> >>>>>>> >>>>>>> splitting the tests into two sets: >>>>>>> >>>>>>>> >>>>>>>> 1. A small set that one would run in addition to the relevant tests, >>>>>>>> whenever making a small change. It should *not* be based on skipping >>>>>>>> complete categories, but on doing those tests from each category >>>>>>>> that >>>>>>>> are >>>>>>>> most likely to detect regression, especially regression due to >>>>>>>> changes >>>>>>>> in >>>>>>>> other areas. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Completely agree. However, most of the QA tests are not clear unit >>>>>>>> or >>>>>>>> >>>>>>> regression tests. They are more integration/conformance tests that >>>>>>> test >>>>>>> the >>>>>>> requirements of the spec and its implementation. >>>>>>> Identifying the list of "right" tests to run as part of the small set >>>>>>> you >>>>>>> mention would require going through all 1059 test descriptions and >>>>>>> their >>>>>>> sources. >>>>>>> >>>>>>> 2. A full test set that may take a lot longer. In many projects, >>>>>>> there >>>>>>> is >>>>>>> a >>>>>>> >>>>>>> >>>>>>> "nightly build" and a test sequence that is run against that build. >>>>>>> >>>>>>>> That >>>>>>>> test sequence can take up to 24 hours to run, and should be as >>>>>>>> complete >>>>>>>> as >>>>>>>> possible. Does Apache have infrastructure to support this sort of >>>>>>>> operation? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Again, completely agree. I'm sure Apache supports this through >>>>>>>> >>>>>>> Hudson. We >>>>>>> could request to setup a second build job, doing nightly builds and >>>>>>> running >>>>>>> the whole test suite. Think this is the only way to make running the >>>>>>> complete QA suite automatically practical. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Are there any tests that people *know* should not run? I'm thinking >>>>>>> of >>>>>>> >>>>>>>> running the lot just to see what happens, but knowing ones that are >>>>>>>> not >>>>>>>> expected to work would help with result interpretation. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> See above, tests of that type should have already been tagged to be >>>>>>>> >>>>>>> skipped >>>>>>> by the good people that donated this test suite. >>>>>>> I've noticed that usually, when a SkipTestVerifier is used in a .td >>>>>>> file, >>>>>>> someone has put some comments in there to explain why it was tagged >>>>>>> as >>>>>>> such. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Patricia >>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>>> >>>> >>> >> >
