Some sub-systems that I know of, particularly around readers, writers, VVs and operators are not unit-testing friendly by design: First, they involve much more logic than one could define as a unit. Second, it is relatively tough if not impossible to control their behavior, mock or inject dependencies because they are tightly coupled with other parts of the system. I would propose starting off with very fundamental yet minor code refactoring that aims to have self-contained, cohesive pieces abstracted away so that we could get these unit-tested first. Applying this idea iteratively should bring better test coverage. Then we can focus on testing operators or other components that rely on these well tested units. Either way I would prefer a piece-meal approach rather than trying to unit-test an entire sub-system.
-Hanifi On Wed, Jun 17, 2015 at 1:53 PM, Abdel Hakim Deneche <adene...@maprtech.com> wrote: > I don't know much work this involves (it seems a lot!) but this would be > really useful. Like you said, with the current model coming up with good > unit tests can be really tricky especially when testing the edge cases, and > the worst part is that any changes to how queries are planned or for > example the size of the batches can make some tests useless. > > On Tue, Jun 16, 2015 at 12:38 PM, Jason Altekruse < > altekruseja...@gmail.com> > wrote: > > > Hello Drill devs, > > > > I would like to propose a proactive effort to make the Drill codebase > > easier to unit test. > > Many JIRAs have been created for bugs that should have been prevented by > > better unit testing, and we are still fixing these kinds of bugs today as > > they crop up. I have a few ideas, and I plan on creating JIRAs for > specific > > refactoring and test infrastructure improvements. Before I do, I would > like > > to collect thoughts from everyone on what can get us the most benefit for > > our work. > > > > As a short overview of the situation today, most of the tests in Drill > take > > the form of running a SQL query on a local drillbit and verifying the > > results. Plenty of times this has been described as more of integration > > testing than unit testing, and it has caused several common testing pains > > and gaps. > > > > 1. batch boundaries - as we cannot control where batches are cut off > during > > the query, complete queries often make it hard to test different > scenarios > > processing an incoming stream of data with given properties. > > - examples of issues: inconsistent behavior between operators, > > some > > operators have failed to handle empty batches, or a batch full > > of nulls > > until we wrote a test that happened to have the right input > file > > and plan to > > produce these scenarios > > 2. Valid planning changes can end up making tests previously designed to > > test execution fail in new ways as the data will now flow differently > > through the operators > > 3. SQL queries as test specifications make it hard to test "everything", > > all types, all possible data properties/structures, all possible switches > > flipped in the planner or configuration for an operator > > > > I would like to start the discussion with a proposal to fix some of these > > problems. We need a way to run an operator easily in isolation. Possible > > steps to achieve this include, a new operator that will produce data in > > explicitly provided batches, that can be configured from a test. This can > > serve as a universal input to unit test operators. We would also need > some > > way to consume and verify the output of the operators. This could share > > code with the current query execution, or possibly side step it to avoid > > having to mock or instantiate the whole query context. > > > > This proposal itself is testing a relatively large part of the system as > a > > whole "unit". I would be interested to hear opinions on the utility vs > > extra effort of trying to refactor more classes so that they can be > created > > in tests and have their individual methods tested. This is already being > > done for some classes like the value vectors, but it is far from > > exhaustive. I don't expect us to start rigidly enforcing this level of > > testing granularity everywhere, but there are components of the system > that > > really need to be resilient and be guaranteed to stay that way as the > > project evolves. > > > > Please chime in with your thoughts. > > > > > > -- > > Abdelhakim Deneche > > Software Engineer > > <http://www.mapr.com/> > > > Now Available - Free Hadoop On-Demand Training > < > http://www.mapr.com/training?utm_source=Email&utm_medium=Signature&utm_campaign=Free%20available > > >