The saga of concurrent tests continues. We had to rollback "concurrent-by-default" yesterday for several reasons: 1) it lost track of coverage numbers, so our coverage report dropped to ~60%. (I'm going to work on making sure that level of drop fails the build in the future), and 2) it made local debugging more difficult, by preventing breakpoint activation and by preventing you from running a single test method.
To mitigate those, once I fix the coverage issue, I'm going to turn concurrent testing on in Jenkins, but leave it disabled by default locally. If you want to have faster tests, you'll need to explicitly use the -p/--processes argument to paver/nosetests when you run the test suite. (Remember, --processes=-1 uses all available cores). Thanks for your patience on this. -Cale On Thu, Jun 9, 2016 at 10:44 AM, Calen Pennington <[email protected]> wrote: > At long last, I'm prepared to re-merge test concurrency and randomization > (for the LMS tests). I've worked through a largish number of different ways > that tests can fail because of different orderings, and I thought I'd share > them here, along with the debugging techniques that I've used to find them. > > So, first, the list of types of failures: > > Problem: Language changes leaking out of tests. The LocaleMiddleware > doesn't reset the django translator at the end of request. This is fine > under normal operation, because the next request will set it to the user's > language, but in tests can be a problem if the next test after a > language-change test doesn't go through the LocaleMiddleware. > > Solution: call django.translation.deactivate as a test cleanup action. > Example: https://github.com/edx/edx-platform/pull/12693 > > Problem: Memory state can change the order of dicts and sets. So, if you > use a set for uniqueness, but then convert the results to a list ( > https://github.com/edx/edx-platform/blob/master/lms/djangoapps/ccx/api/v0/views.py#L168), > the order of the data on the way may change. > Solution: Use assertItemsEqual when comparing lists made from sets. (This > was a subtle one). > Example: https://github.com/edx/edx-platform/pull/12674 > > Problem: Monkey-patches aren't cleaned up after tests. If you just use > Class.method > = mock_method, then that patch isn't reset when your test is over, and > could break other tests that attempt to use that same method. > Solution: Use the patch function from the mock library > Example: https://github.com/edx/edx-platform/pull/12673 > > Problem: Course content is wrong in the middle of a test. This can happen > if a test inside a SharedModulestoreTestCase makes modifications to course > content. Those modifications are reset at the end of the test, and future > tests end up using the modified course (without expecting to). > Solution: Use a separate ModulestoreTestCase for the tests that modify the > course. > Example: https://github.com/edx/edx-platform/pull/12672 > > Problem: Django reverse doesn't find the urls that a test expects. The > django url configuration is set at import time. Tests which modify settings > to changes urls.py (directly or indirectly) won't see those changes unless > they reset the url configuration. > Solution: Use the UrlResetMixin class to reset the urls.py file so that > it is re-imported before and after tests that change it. > Example: https://github.com/edx/edx-platform/pull/12670 > > Problem: Test configuration data in global dictionaries is changed by > specific tests. If those tests don't use a copy of the dictionary or reset > their changes at the end of the test, other tests using the same dictionary > will be affected. This can happen even when the dictionary is passed to the > override_settings decorator. > Solution: Use a copy of the global dictionary when overriding settings > values or passing it into a request. > Examples: https://github.com/edx/edx-platform/pull/12668, > https://github.com/edx/edx-platform/pull/12626 > > Problem: Uploaded files are left around after tests. When django saves an > uploaded file, if the filename already exists, the new file is given a > random suffix. If your test relies on the filename, then it may be broken > unexpectedly by the suffix. > Solution: Delete any uploaded files at the end of the test. > Example: https://github.com/edx/edx-platform/pull/12629 > > Problem: Manually executing middleware process_request methods may change > global state that should be cleaned up by running the corresponding > process_response method. > Solution: Use the django TestClient to test views, rather than calling the > view directly (because the TestClient runs all enabled middleware). > Example: https://github.com/edx/edx-platform/pull/12629 > > > Debugging Tips: > When the python unit tests fail unexpectedly, and you think it might be an > ordering problem, try the following. First, attempt to reproduce locally by > running the same shard with the same randomized seed. This means using the > --attrs=... argument found in the jenkins to specify the subset of tests > to run, and then using the --randomly-seed=... argument that is logged at > the start of the test run (also in the jenkins build log). Pass those > values in the --extra_args="..." parameter to paver when starting your > tests. When you do that, hopefully you will see the same failure (which > would indicate that it's likely an ordering-based failure). At that point, > you can attempt to reproduce it using a smaller test case. I typically was > able to find the ordering issue by running the test that failed as well as > the test that ran immediately before it (although occasionally it would be > a test that ran significantly ahead of the failing test that left the suite > in a bad state). > > > A final note: One problem that I haven't been able to solve is a > intermittent segmentation fault in sqlite. It appears to be the same > segfault mentioned in issue #24080 > <https://code.djangoproject.com/ticket/24080> on the django tracker, and > unfortunately appears to be related simply to a large number of queries > getting sqlite into a bad state. It doesn't appear to be something that we > can fix purely on the django side, and upgrading sqlite is fraught with > difficulty, because of the number of packages that depend on it. If you hit > that failure, please just re-run your unit tests (via jenkins run python > in a PR comment), and hopefully you'll get a better ordering the next time > around. We'll be monitoring the frequency of these segfaults, and if they > get too disruptive, we'll aim to put more weight behind resolving the issue. > > > Thanks > -Cale > > On Fri, May 20, 2016 at 9:30 AM, Calen Pennington <[email protected]> wrote: > >> I merged the changes yesterday afternoon, and sadly they started making >> the master build fail with a bunch of different intermittent failures. So, >> I turned off randomization and concurrency, and will be making separate PRs >> to debug each of them independently. You can still enable them yourselves >> using the --randomize and --processes=-1 flags (but be warned, there may be >> odd failures). >> >> -Cale >> >> On Thu, May 19, 2016 at 11:55 AM, Calen Pennington <[email protected]> wrote: >> >>> I've gotten a few questions about how much this actually speeds up the >>> tests. I don't have rigorous numbers, but I've seen this shave ~2-4 minutes >>> off of the actual testing part of the lms shards on jenkins (from about >>> 13-18 minutes to about 10-11 minutes) on the Jenkins workers. I also ran it >>> on a much larger box (with 8 cores, instead of the 2 on our typical jenkins >>> worker), and I was able to run all of the LMS tests in under 10 minutes (on >>> a single box). >>> >>> -Cale >>> >>> On Thu, May 19, 2016 at 10:27 AM, Calen Pennington <[email protected]> wrote: >>> >>>> Hi, all, >>>> >>>> I'm just finishing up the few changes to enable concurrent unit testing >>>> for the LMS unit tests in edx-platform. This will be turned on by default, >>>> and will let the tests run across all of the available cores on your >>>> machine, which should speed them up noticeably. >>>> >>>> Nothing is for free, though, so there are some caveats: >>>> >>>> 1) Concurrency introduces some amount of indeterminacy in the order of >>>> the tests. So, test classes may run in different orders than you've seen >>>> them before, and might run in different orders between runs of the tests >>>> suite. This is likely not a problem, unless you've accidentally introduced >>>> a dependency between your test classes. (Fixing those historical >>>> dependencies is actually what I've spent much of this project working on). >>>> 2) Because we're already getting some re-ordering of tests, I installed >>>> and enabled the 'randomly' plugin for nose, which randomizes the orders of >>>> all of the tests. Again, not a problem, unless test classes or test methods >>>> have ordering dependencies. >>>> >>>> Both of these changes should help root out inter-test dependencies. >>>> When tests depend on each other, it hard to distinguish actual failures >>>> from failures that happen purely because earlier tests haven't run, which >>>> can slow down future development. Ensuring that our tests can run in any >>>> order will help keep future development easier. >>>> >>>> The new features also add a couple of options to paver: >>>> '--no-randomize' will turn off randomization. '--processes=N' will tell the >>>> system to only run N processes (useful if you don't want the tests using >>>> all of your CPUs). '--extra_args="--randomly-seed=N"' can be useful to >>>> reproduce a particular ordering of tests, in case you are seeing what >>>> appears to be an order-dependent failure. You can find the seed used during >>>> a test run in the test output (when enabled, randomly will add the line >>>> 'Using --randomly-seed=XXXXX' to the test output). >>>> >>>> If you do see a failure that seems intermittent, please let me know. As >>>> I've been working on this, I've seen a couple myself, but they are often >>>> hard to reproduce and fix. >>>> >>>> As I've been making these changes, here are a number of the failure >>>> modes I've had to deal with: >>>> >>>> 1) Small-value changes to the number of queries being measured by >>>> assertNumQueries: this is usually caused by cache dependencies, >>>> particularly of things like LanguagePreferences or EmbargoStatus. Usually, >>>> these numbers can just be adjusted. >>>> 2) Suddenly-missing data: This is often related to a cache now being >>>> disabled by default that used to be enabled. The CacheIsolationTestCase >>>> (which is a base class of ModulestoreTestCase) allows you to specifically >>>> enable caches for certain tests. >>>> 3) Unexpected data: Often this is indicative of a resources being >>>> shared between concurrently running tests. In my changes, I've isolated the >>>> modulestore, the sqlite database used by django, and specific instances of >>>> the filesystem (by separating directories). Typically, the solution >>>> involves putting a unique identifier in the name of the resource to prevent >>>> collisions between test runs. >>>> >>>> Thanks >>>> -Cale >>>> >>>> p.s. If you want to follow the progress of the pull request, it's >>>> https://github.com/edx/edx-platform/pull/12386 >>>> >>> >>> >> > -- You received this message because you are subscribed to the Google Groups "General Open edX discussion" group. To view this discussion on the web visit https://groups.google.com/d/msgid/edx-code/CABSDBxDHhchJAhbRmmeTATHjxR%2B6UB%3DVnhmoz_fi-V5B17Lerg%40mail.gmail.com.
