I've spent the middle portion of the day reviewing our test infrastructure
and related open tickets and have some ideas to make some things better. I
titled this post for 3.3.0, but, in truth, I'm not sure what must be 3.3.0
and what might yet be useful and good for 3.2.x. I'm also using this email
as a way to organize my notes/ideas from the day, so apologies if I'm
dumping a lot of stuff here to follow.

(1) Of all the things I came up with, I think the biggest breaker is this
one: have one uber test suite in gremlin-test. In other words, merge
gremlin-groovy-test to gremlin-test and get rid of that all together. Then.
stop publishing test artifacts out of hadoop-gremlin (and wherever else we
might be doing that). We can make groovy and hadoop dependencies optional
so that if providers aren't using them, they don't have to have them sucked
in and can just depend on them as needed.

(2) Next biggest breaker - how does everyone feel about killing OptOut and
OptIn and getting those concepts out of gremlin-core and into features of
gremlin-test. I've heard at least two Graph providers mention a problem
where they want to "OptOut" more at the GraphProvider level as opposed to
the Graph level as their configurations in the GraphProvider do more to
drive that setting than the Graph does. I don't think we lose anything by
moving "OptOut" except for the describeGraph() functionality:

http://tinkerpop.apache.org/docs/current/reference/#describe-graph

which I'm not sure is that big a deal to worry about. That was a bit of a
nice idea that didn't really develop any further than where it is right now.

(3) We currently tied the GraphProvider to a specific configuration of a
Graph instance. So every time you want a slight permutation on that
configuration, you need to create a new GraphProvider instance. I think
that we can simplify that and cut down on the proliferation of those
instances and in the same moment offer some added flexibility. I was
digging through JUnit docs/code and I think there is a way for us to create
a "GraphProviderSource" which would annotate a test (rather than
@GraphProviderClass). The GraphProviderSource would produce a list of
GraphProvider instances to run each test in a suite with. So, if the
GraphProviderSource produced 4 different GraphProvider instances, it would
execute each test in the suite 4 times (one for each GraphProvider).

(4) I think this approach is nice because it spreads into something else
that I think is important to us: getting maximum value for time out of our
tests. As we add GLVs and more tests (I think that without integration
tests right now, we're over 12000 tests), the time it takes to do a basic
mvn clean install is getting longer and longer. We want that that as short
as possible while maximizing code coverage. To that end, I'll make several
points:

+ jacoco is now good with java 8 (i think it has been for a while, but i
hadn't noticed). i worked with it a bit today and we should be able to get
a good aggregate test coverage report with it (assuming we are ok with
adding a new "gremlin-coverage" module to maven - stinks, but perhaps isn't
so different than having added gremlin-benchmarks in some respects). If we
have that we can find out what combinations of GraphProviders give us the
best coverage for time and make that our standard testing profile.
+ We can build some fairly exotic GraphProviderSource implementations that
can help us test all possible configuration options for TinkerGraph or
cover ranges of settings in Neo4jGraph or randomize the returned
GraphProviders - these could all be options we execute in docker during
code freeze week (and perhaps periodically during our dev cycles) to ensure
we're not breaking anything as a result of running the "maximized"
configuration of just mvn clean install.
+ If that works, we can eliminate the use or Random in our test suite for a
standard mvn clean install thus eliminating the chance of some
non-deterministic behavior. Rather than be "random" we just test all the
cases.
+ Perhaps we could have different maven profiles that ran different
GraphProviderSource implementations. I'm thinking that those might be
triggered from different docker runs to help parallelize the tests and
allow us to test more permutations more quickly???

(5) Finally, I think we could speed up our test suite if we could figure
out a way to cache Graph.Features in the test suite. A lot of tests get
"ignored" because of test requirements, but the test suite requires a Graph
instance to check those requirements against the Features. For some
providers, creating the Graph instances introduces disk I/O even when the
test will be ignored because of the feature. That setup/teardown is
expensive and ends up slowing the tests. If we could cache those somehow
and thus avoid the Graph instance creation, we'd save some processing - I
suspect it would be helpful to us internally with Neo4j. The trick of
course is that the Features implementation can't be dynamically driven and
there is at least one method that could be called on Features that is
typically dynamically driven based on schema. Very few tests use that
however, so perhaps there is some way to workaround that problem.

Well, my brain has been dumped. Thoughts welcome.

Reply via email to