Yes, that would be useful for testing applications (integration tests). Likewise good tests for existing demos and examples will also help.
On Mon, Sep 12, 2016 at 6:13 PM, Munagala Ramanath <[email protected]> wrote: > A good start would be to revise the archetype to include as many > illustrative tests as reasonably possible -- people seem more willing to > follow examples than to follow instructions. > Ram > > On Sep 12, 2016 5:26 PM, "Thomas Weise" <[email protected]> wrote: > > Hi, > > Recently there was a bit of discussion on how to write tests for operators > that will result in good coverage and high confidence in the results of the > CI. Experience from past releases show that those operators with good > coverage are less likely to break down (with a user) due to subsequent > changes, while those that don't have coverage in the CI (think contrib) are > likely to suffer breakdown even due to trivial changes that are otherwise > easily caught. > > IMO writing good tests is as important as the operator main code (and > documentation and examples..). It was also part of the maturity framework > that Ashwin proposed a while ago (Ashwin, maybe you can also share a few > points). I suggest we expand the contribution guidelines to reflect an > agreed set of expectations that contributors can follow when submitting PRs > or even come up with a checklist for submitting PRs: > > http://apex.apache.org/malhar-contributing.html > > Here are a few recurring problems and suggestions in nor particular order: > > - Unit tests are for testing small pieces of code in isolation ("unit"). > Running a DAG in embedded mode is not a unit test, it is an integration > test. > - When writing an operator or making changes to fix bugs etc., it is > recommended to write or modify the granular test that exercises this > change > and as little as possible around it. This happens before writing or > running > an application and can be done in fast iterations inside the IDE without > extensive test data setup or application assembly. > - When an operator consists of multiple other components, then testing > for those should also be broken down into units. For example, managed > state > is not tested by testing dedup or join operator (which are special use > cases), but through separate tests, that exercise the full spectrum (or > at > least close to) of managed state. > - So what about serialization, don't I need to create a DAG to test it? > You only need Kryo to test serialization of an operator. Use the > existing > utilities or contribute to utilities that are shared between tests. > - Don't I need to run a DAG to test the lifecycle of an operator? No, > the sequence of calls to an operator's lifecycle methods are documented > (or > how else would I implement an operator to start with). There are quite a > few tests that "execute" the operator directly. They have access to the > state and can assert that with a certain process invocation the expected > changes occur. That is much more difficult when running a DAG. > - I have to write a lot of code to do such testing and possibly I will > forget some calls? Not when following test driven development. IMO that > mostly happens when tests are written as afterthought and that's a waste > of > time. I would suggest though to develop a single operator test driver > that > will ensures all methods are called for basic sanity check. > - Integration tests: with proper unit test coverage, the integration > test is more like an example of how to use an operator. Nice for users, > because they can use it as a starting point for writing their own app, > including the configuration. > - I wrote a nice integration test app with configuration. It runs for > exactly <n> seconds (localmode.run(n)) returns and all looks green. It > even > prints some nice stuff in the console. What's wrong? You have not tested > anything! An operator may fail in setup and the test still passes. > Travis > CI is not reading the console (instead, it will complain that tests are > filling up 4MB too fast and really important logs go under). Instead, > assert on your test code that the DAG execution produces the expected > results. Instead of waiting for <n> seconds wait until expected results > are > in and cap it with a timeout. This is yet another area where a few > utilities for recurring test code will come in handy. > - Tests sometimes fail, but they work on my local machine? Every > environment is different and good tests don't depend on environment > specific factors (timing dependency, excessive resource utilization > etc.). > It is important that tests pass in the CI consistently and that issues > found there are investigated and fixed. Isn't it nice to see the green > check mark in the PR instead of having to close/reopen several times so > that the unrelated flaky test does not fail. If we collectively track > and > fix such failures life will be better for everyone. > > Looking forward to feedback, additions and most importantly volunteers that > will help making the Apex CI better. > > Thanks, > Thomas >
