[chromium-dev] Re: revising the output from run_webkit_tests
Anyone who wants to follow along on this, I've filed http://code.google.com/p/chromium/issues/detail?id=26659 to track it. -- Dirk On Sat, Oct 24, 2009 at 5:01 PM, Dirk Pranke wrote: > Sure. I was floating the idea first before doing any work, but I'll > just grab an existing text run and hack it up for comparison ... > > -- Dirk > > On Fri, Oct 23, 2009 at 3:51 PM, Ojan Vafai wrote: >> Can you give example outputs for the common cases? It would be easier to >> discuss those. >> >> On Fri, Oct 23, 2009 at 3:43 PM, Dirk Pranke wrote: >>> >>> If you've never run run_webkit_tests to run the layout test >>> regression, or don't care about it, you can stop reading ... >>> >>> If you have run it, and you're like me, you've probably wondered a lot >>> about the output ... questions like: >>> >>> 1) what do the numbers printed at the beginning of the test mean? >>> 2) what do all of these "test failed" messages mean, and are they bad? >>> 3) what do the numbers printed at the end of the test mean? >>> 4) why are the numbers at the end different from the numbers at the >>> beginning? >>> 5) did my regression run cleanly, or not? >>> >>> You may have also wondered a couple of other things: >>> 6) What do we expect this test to do? >>> 7) Where is the baseline for this test? >>> 8) What is the baseline search path for this test? >>> >>> Having just spent a week trying (again), to reconcile the numbers I'm >>> getting on the LTTF dashboard with what we print out in the test, I'm >>> thinking about drastically revising the output from the script, >>> roughly as follows: >>> >>> * print the information needed to reproduce the test and look at the >>> results >>> * print the expected results in summary form (roughly the expanded >>> version of the first table in the dashboard - # of tests by >>> (wontfix/fix/defer x pass/fail/are flaky). >>> * don't print out failure text to the screen during the run >>> * print out any *unexpected* results at the end (like we do today) >>> >>> The goal would be that if all of your tests pass, you get less than a >>> small screenful of output from running the tests. >>> >>> In addition, we would record a full log of (test,expectation,result) >>> to the results directory (and this would also be available onscreen >>> with --verbose) >>> >>> Lastly, I'll add a flag to re-run the tests that just failed, so it's >>> easy to test if the failures were flaky. >>> >>> Then I'll rip out as much of the set logic in test_expectations.py as >>> we can possibly get away with, so that no one has to spend the week I >>> just did again. I'll probably replace it with much of the logic I use >>> to generate the dashboard, which is much more flexible in terms of >>> extracting different types of queries and numbers. >>> >>> I think the net result will be the same level of information that we >>> get today, just in much more meaningful form. >>> >>> Thoughts? Comments? Is anyone particularly wedded to the existing >>> output, or worried about losing a particular piece of info? >>> >>> -- Dirk >> >> > --~--~-~--~~~---~--~~ Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev -~--~~~~--~~--~--~---
[chromium-dev] Re: revising the output from run_webkit_tests
Sure. I was floating the idea first before doing any work, but I'll just grab an existing text run and hack it up for comparison ... -- Dirk On Fri, Oct 23, 2009 at 3:51 PM, Ojan Vafai wrote: > Can you give example outputs for the common cases? It would be easier to > discuss those. > > On Fri, Oct 23, 2009 at 3:43 PM, Dirk Pranke wrote: >> >> If you've never run run_webkit_tests to run the layout test >> regression, or don't care about it, you can stop reading ... >> >> If you have run it, and you're like me, you've probably wondered a lot >> about the output ... questions like: >> >> 1) what do the numbers printed at the beginning of the test mean? >> 2) what do all of these "test failed" messages mean, and are they bad? >> 3) what do the numbers printed at the end of the test mean? >> 4) why are the numbers at the end different from the numbers at the >> beginning? >> 5) did my regression run cleanly, or not? >> >> You may have also wondered a couple of other things: >> 6) What do we expect this test to do? >> 7) Where is the baseline for this test? >> 8) What is the baseline search path for this test? >> >> Having just spent a week trying (again), to reconcile the numbers I'm >> getting on the LTTF dashboard with what we print out in the test, I'm >> thinking about drastically revising the output from the script, >> roughly as follows: >> >> * print the information needed to reproduce the test and look at the >> results >> * print the expected results in summary form (roughly the expanded >> version of the first table in the dashboard - # of tests by >> (wontfix/fix/defer x pass/fail/are flaky). >> * don't print out failure text to the screen during the run >> * print out any *unexpected* results at the end (like we do today) >> >> The goal would be that if all of your tests pass, you get less than a >> small screenful of output from running the tests. >> >> In addition, we would record a full log of (test,expectation,result) >> to the results directory (and this would also be available onscreen >> with --verbose) >> >> Lastly, I'll add a flag to re-run the tests that just failed, so it's >> easy to test if the failures were flaky. >> >> Then I'll rip out as much of the set logic in test_expectations.py as >> we can possibly get away with, so that no one has to spend the week I >> just did again. I'll probably replace it with much of the logic I use >> to generate the dashboard, which is much more flexible in terms of >> extracting different types of queries and numbers. >> >> I think the net result will be the same level of information that we >> get today, just in much more meaningful form. >> >> Thoughts? Comments? Is anyone particularly wedded to the existing >> output, or worried about losing a particular piece of info? >> >> -- Dirk > > --~--~-~--~~~---~--~~ Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev -~--~~~~--~~--~--~---
[chromium-dev] Re: revising the output from run_webkit_tests
On Fri, Oct 23, 2009 at 3:43 PM, Dirk Pranke wrote: > > If you've never run run_webkit_tests to run the layout test > regression, or don't care about it, you can stop reading ... > > If you have run it, and you're like me, you've probably wondered a lot > about the output ... questions like: > > 1) what do the numbers printed at the beginning of the test mean? > 2) what do all of these "test failed" messages mean, and are they bad? > 3) what do the numbers printed at the end of the test mean? > 4) why are the numbers at the end different from the numbers at the > beginning? > 5) did my regression run cleanly, or not? > > You may have also wondered a couple of other things: > 6) What do we expect this test to do? > 7) Where is the baseline for this test? > 8) What is the baseline search path for this test? > > Having just spent a week trying (again), to reconcile the numbers I'm > getting on the LTTF dashboard with what we print out in the test, I'm > thinking about drastically revising the output from the script, > roughly as follows: > > * print the information needed to reproduce the test and look at the > results > * print the expected results in summary form (roughly the expanded > version of the first table in the dashboard - # of tests by > (wontfix/fix/defer x pass/fail/are flaky). > * don't print out failure text to the screen during the run > * print out any *unexpected* results at the end (like we do today) > > The goal would be that if all of your tests pass, you get less than a > small screenful of output from running the tests. > > In addition, we would record a full log of (test,expectation,result) > to the results directory (and this would also be available onscreen > with --verbose) > > Lastly, I'll add a flag to re-run the tests that just failed, so it's > easy to test if the failures were flaky. > This would be nice for the buildbots. We would also need to add a new section in the results for Unexpected Flaky Tests (failed then passed). Nicolas > > Then I'll rip out as much of the set logic in test_expectations.py as > we can possibly get away with, so that no one has to spend the week I > just did again. I'll probably replace it with much of the logic I use > to generate the dashboard, which is much more flexible in terms of > extracting different types of queries and numbers. > > I think the net result will be the same level of information that we > get today, just in much more meaningful form. > > Thoughts? Comments? Is anyone particularly wedded to the existing > output, or worried about losing a particular piece of info? > > -- Dirk > > > > --~--~-~--~~~---~--~~ Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev -~--~~~~--~~--~--~---
[chromium-dev] Re: revising the output from run_webkit_tests
Can you give example outputs for the common cases? It would be easier to discuss those. On Fri, Oct 23, 2009 at 3:43 PM, Dirk Pranke wrote: > If you've never run run_webkit_tests to run the layout test > regression, or don't care about it, you can stop reading ... > > If you have run it, and you're like me, you've probably wondered a lot > about the output ... questions like: > > 1) what do the numbers printed at the beginning of the test mean? > 2) what do all of these "test failed" messages mean, and are they bad? > 3) what do the numbers printed at the end of the test mean? > 4) why are the numbers at the end different from the numbers at the > beginning? > 5) did my regression run cleanly, or not? > > You may have also wondered a couple of other things: > 6) What do we expect this test to do? > 7) Where is the baseline for this test? > 8) What is the baseline search path for this test? > > Having just spent a week trying (again), to reconcile the numbers I'm > getting on the LTTF dashboard with what we print out in the test, I'm > thinking about drastically revising the output from the script, > roughly as follows: > > * print the information needed to reproduce the test and look at the > results > * print the expected results in summary form (roughly the expanded > version of the first table in the dashboard - # of tests by > (wontfix/fix/defer x pass/fail/are flaky). > * don't print out failure text to the screen during the run > * print out any *unexpected* results at the end (like we do today) > > The goal would be that if all of your tests pass, you get less than a > small screenful of output from running the tests. > > In addition, we would record a full log of (test,expectation,result) > to the results directory (and this would also be available onscreen > with --verbose) > > Lastly, I'll add a flag to re-run the tests that just failed, so it's > easy to test if the failures were flaky. > > Then I'll rip out as much of the set logic in test_expectations.py as > we can possibly get away with, so that no one has to spend the week I > just did again. I'll probably replace it with much of the logic I use > to generate the dashboard, which is much more flexible in terms of > extracting different types of queries and numbers. > > I think the net result will be the same level of information that we > get today, just in much more meaningful form. > > Thoughts? Comments? Is anyone particularly wedded to the existing > output, or worried about losing a particular piece of info? > > -- Dirk > --~--~-~--~~~---~--~~ Chromium Developers mailing list: chromium-dev@googlegroups.com View archives, change email options, or unsubscribe: http://groups.google.com/group/chromium-dev -~--~~~~--~~--~--~---