I think everything Casey mentioned is a good call-out as things start to
build into specifics. I definitely agree it's a very nontrivial amount of
work, but that lowering the barrier of entry to a lot of PRs eases the
burden on both new and existing contributors by a substantial amount.

@Mike,
As a heads up, I (super briefly) looked into the Docker stuff a bit, and
the extension idea may not work with the Docker stuff to the extent we want
(at least without us doing some additional work ourselves).  It seems like
at least what I linked earlier and some other stuff actually provide direct
annotations rather than plugging directly into the same extensions idea.

Before we dive into it too much, it might be worth playing around with it
more and coming back to the group with a couple options. If you're
interested in looking into it, I *suspect* it'll boil down to a couple
options
* Use Docker with something like the above link or testcontainers
<https://www.testcontainers.org/>. It's possible the Docker stuff ends up
being lightweight / fast enough to use on at least a per class basis like
we do now, rather than trying to generalize across all tests immediately.
* Roll our own code to have more fine-grained control over the Docker
lifecycle and which components need to be spun up for extensions.
* Figure out how to make the prior options play nice
* Other

I'll probably dig a bit on my own, but I'm not sure how much focused time
I'll be able to put into it in the absolute immediate term.  I can probably
whip up a quick demo of the extensions stuff over the next week or so in a
one-off project to give a bit of a demo and maybe help out with some of
experimentation. Feel free to reach out if there's anything in particular
that would be helpful to look into.

The extensions stuff does have a lot of benefits, but I had less components
to work with and didn't have the same classpath worries that made real
instances (i.e. Docker) more attractive. It was sufficient for our
purposes, but there might have to be compromises here. We depend on a lot
more of the Hadoop stack.

Migrating to JUnit 5 in general *should* be pretty easy. I don't think we
really use any of the stuff that migrated in odd ways, so it should mostly
just be updating annotations and imports (@Before to @BeforeEach, etc.).
I'm sure this glosses over at least a few gotchas, though.

On Fri, May 3, 2019 at 10:09 AM Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> I didn't get a chance to say so earlier, but Justin, I also like the JUnit
> 5 extension suggestion. I've gone through some en-masse changes before,
> e.g. standardizing the log4j construction idiom, and it honestly wasn't too
> bad. Just a thought, it might make sense to kick this off by upgrading
> overall JUnit 4 to 5 across the code base, and then diving into some of the
> more 5-specific changes you're recommending as needed. I created this Jira
> a bit ago - https://issues.apache.org/jira/browse/METRON-2037. That was to
> upgrade to 4.13, but we might be able to kill 2 birds with one stone if we
> go to JUnit 5. I'm volunteering to look into this and/or see the work
> through to completion. What do you think?
>
> > - debuggability (right now we run the tests in the same JVM and setting
>    breakpoints is trivial, even in the innards of Hadoop.  This is very
>    valuable for figuring out what's going wrong and we'll need SOME
> solution
>    for it)
>
> Yeah Casey, I remember this from the last time we discussed it. That's the
> most import issue to be sure we have a handle on, imo. We'll need to figure
> out remote debugging in Docker containers. Not to mention, the execution
> path becomes a bit more spread out when we're running multiple components
> as nature intended across multiple processes.
>
>
>
> On Fri, May 3, 2019 at 7:14 AM Casey Stella <ceste...@gmail.com> wrote:
>
> > I just want to chime in and say I'm STRONGLY in favor of a docker-based
> > approach to testing (I specifically like the JUnit 5 extensions
> > suggestion).  I think that forcing a full-dev evaluation for every small
> PR
> > is a barrier to entry that I'd like to overcome.  I also think that this
> is
> > going to not be trivial.
> >
> > There will be weirdness/drama with:
> >
> >    - cleanup
> >    - setup in situations where multi-components are used
> >    - debuggability (right now we run the tests in the same JVM and
> setting
> >    breakpoints is trivial, even in the innards of Hadoop.  This is very
> >    valuable for figuring out what's going wrong and we'll need SOME
> > solution
> >    for it)
> >    - possible resource limitations in travis for running tests with
> >    multiple components
> >
> > Even so, with ALL of that being said, I still think the value outweighs
> the
> > difficulty by a factor of 10.  Being able to be confident after a travis
> > run that people aren't introducing subtle classpath or cross-component
> > interaction issues would open up 80% of the class of PRs that don't
> require
> > full-dev review.  That being said, I still don't think it's 100%.
> > Specifically, PRs which can credibly be argued that they touch
> installation
> > pathways would still need to be verified in full-dev as it's the only
> path
> > to validating that (otherwise we would be regressing in test coverage).
> >
> > On Wed, May 1, 2019 at 9:33 PM Justin Leet <justinjl...@gmail.com>
> wrote:
> >
> > > >
> > > > My impression is that this is already the status quo. But, if we
> think
> > we
> > > > need to be more clear on this, let's put up a vote to change the
> coding
> > > > guidelines and PR checklist. I've done this many times in the past,
> the
> > > > most obvious instances are when I've made doc changes or unit test
> > > > modifications because those will not impact full dev. I will own this
> > > item.
> > > > I think it can probably get rolled in with my dev guideline changes
> for
> > > > architecture diagrams.
> > >
> > >
> > > For completeness in our PR checklist: "- [ ] Have you verified the
> basic
> > > functionality of the build by building and running locally with Vagrant
> > > full-dev environment or the equivalent?"  In practice, you're right,
> but
> > > any newer contributors aren't necessarily going to know this.
> > >
> > > 1. I think we should create Jiras with the end deliverable being that
> our
> > > > private vs public API endpoints are clearly delineated. From there,
> we
> > > > create another round of javadoc - for the public APIs let the
> javadocs
> > > rain
> > > > from the heavens to your heart's content. It's for public consumption
> > and
> > > > should assist end users. See Mockito, for example -
> > > >
> > > >
> > >
> >
> https://static.javadoc.io/org.mockito/mockito-core/2.27.0/org/mockito/Mockito.html
> > > > .
> > > > For developer docs, I'm of the *extremely strong* opinion that this
> > > should
> > > > be limited. Emphasize module, package, class, and method naming
> > > conventions
> > > > over all else. If it doesn't make sense just reading the code, take a
> > > > minute to summarize what you're doing and consider refactoring. For
> > > > legitimately more complex and necessary code passages, add a note.
> For
> > > > multi-class interactions that provide a larger story arc, add dev
> docs
> > to
> > > > the relevant READMEs. Our use of Zookeeper Curator and its
> interaction
> > > with
> > > > our topology config loading is a perfect example of a feature that
> > would
> > > > fit this need.
> > > >
> > > 2. I'm an immediate -1 on any documentation that looks like " /** Open
> > the
> > > > car door **/ public void openCarDoor() {...}" :-). The code speaks
> for
> > > > itself.
> > > > 3. Publish javadocs for public APIs, not our internal dev APIs. Let
> > your
> > > > fav IDE fill in the gaps for devs.
> > >
> > >
> > > I'm +1 on delineating public vs private APIs like you've outlined
> > there.  I
> > > think our dev stuff is *better* than our general usage guides, but
> > there's
> > > room for improvement. I'm fairly agnostic on the dev docs because to be
> > > honest, a ton of our older code is not at all self explanatory, and to
> be
> > > blunt refactoring a lot of it is a substantial lift (as we've all seen
> > > multiple times trying to refactor it).  If this were greenfield, I'd be
> > in
> > > much stronger agreement with you, but I suspect in practice there's a
> lot
> > > of stuff nobody's going to refactor for awhile.
> > >
> > >
> > > > Full dev until we vote to replace the existing setup and can be
> > confident
> > > > that the new approach 1. is stable, 2. takes <= the amount of time to
> > > > complete as full dev. I am +1 for migrating towards this approach and
> > > think
> > > > we should do so when it's dialed in.
> > >
> > >
> > > Great, I look forward to that getting in.
> > >
> > > Justin, what are your thoughts on leveraging this approach along with
> > > > long-lived Docker containers?
> > > >
> > >
> > > Apparently, there's actually an extension for running Docker
> containers,
> > > see  https://faustxvi.github.io/junit5-docker/.  My main hesitation
> > there
> > > is more around how much effort to migrate it is. I think that's almost
> > > certainly a cleaner long term solution, but I suspect the 80% solution
> of
> > > migrating what we have *might* be easier.  There might also be ways of
> > just
> > > leveraging this by moving stuff over to failsafe from surefire, but I
> > > really don't know enough about it.
> > >
> > >  Clean/reset component state after test method and/or test class
> > > >    completion
> > >
> > >
> > > Probably doable regardless of approach, but things can take nontrivial
> > > amounts of time to clean up (e.g. topic deletion or clearing). I
> believe
> > we
> > > do this right now, so I'd expect to continue with that.
> > >
> > >
> > > On Wed, May 1, 2019 at 8:48 PM Michael Miklavcic <
> > > michael.miklav...@gmail.com> wrote:
> > >
> > > > Justin, what are your thoughts on leveraging this approach along with
> > > > long-lived Docker containers? I think the lifecycle would look like:
> > > >
> > > >    1. I need components A, B, C
> > > >    2. If not started, start A, B, C
> > > >    3. If started, clean/reset it
> > > >    4. Setup pre-test state
> > > >    5. Run test(s)
> > > >    6. Clean/reset component state after test method and/or test class
> > > >    completion (we may consider nuking this step - I assisted in
> > designing
> > > > an
> > > >    acceptance testing framework in the past where we handled cleanup
> on
> > > the
> > > >    front end. This meant that remediation for failed tests became
> > simpler
> > > >    because I still had the state from the failed test run)
> > > >
> > > > Stopping/removing the containers could be a manual process or just a
> > > simple
> > > > script in the project root. We could also add a special module that
> > runs
> > > > last that handles shutting down containers, if desired.
> > > >
> > > > I know this has been a perennial thread for us. I can dig up the
> > original
> > > > discuss threads on this as well if folks think they're still
> pertinent,
> > > but
> > > > I think we're pretty far removed now from that original point in time
> > and
> > > > should just move forward with fresh perspectives.
> > > >
> > > > On Wed, May 1, 2019 at 9:21 AM Justin Leet <justinjl...@gmail.com>
> > > wrote:
> > > >
> > > > > Re: the integration testing point above, a strategy I used recently
> > to
> > > > > alleviate a similar problem was to exploit JUnit 5's extensions.  I
> > > > haven't
> > > > > played with this at all in Metron's code, so 1) Take this with a
> > > several
> > > > > grains of salt and 2) Feel encouraged to point out anything
> > > > > wrong/problematic/ludicrous.  My use case was pretty tame and easy
> to
> > > > > implement.
> > > > >
> > > > > Essentially, you can set up an extension backed by a singleton
> > cluster
> > > > (in
> > > > > the case I was doing, I had two extensions, one for a Kafka cluster
> > and
> > > > one
> > > > > for MiniDfs).  The backing clusters expose some methods (i.e.
> create
> > > > topic,
> > > > > get brokers, get file system, etc.), which lets the tests classes
> > setup
> > > > > whatever they need. Classes that need them just use an annotation
> and
> > > can
> > > > > be setup under the hood to init only when tests in the current run
> > > suite
> > > > > actually use them and spin down after all are done. This saved ~1
> min
> > > on
> > > > a
> > > > > 4 minute build.  It ends up being a pretty clean way to use them,
> > > > although
> > > > > we might need to be a bit more sophisticated, since my case was
> less
> > > > > complicated. The main messiness is that this necessarily invites
> > tests
> > > to
> > > > > interfere with each other (i.e. if multiple tests use the kafka
> topic
> > > > > "test", problems will be involved). We might be able to improve on
> > > this.
> > > > >
> > > > > We could likely do something similar with our InMemoryComponents.
> > Right
> > > > now
> > > > > these are spun up on a per-test basis, rather than the overall
> suite.
> > > > This
> > > > > involves some setup in any class that wants them, just to be able
> to
> > > get
> > > > a
> > > > > cluster.  There's also some definition of spin up order and so on,
> > > which
> > > > is
> > > > > already taken care of with the extensions (declaration order).  The
> > > other
> > > > > catch in our case is that we have way more code covered by JUnit 4,
> > > which
> > > > > doesn't have the same capabilities.  That migration doesn't
> > necessarily
> > > > > have to be done all at once, but it is a potential substantial pain
> > > > point.
> > > > >
> > > > > Basically, I'd hope to push management of our abstraction further
> > back
> > > > into
> > > > > actual JUnit so they're get more reuse across runs and it's easily
> > > > > understood what a test needs and uses right up front.  In our case,
> > the
> > > > > InMemoryComponents likely map to extensions.
> > > > >
> > > > > If we did something like this, it potentially solves a few problems
> > > > > * Makes it easy to build an integration test that says "Give me
> these
> > > > > components".
> > > > > * Keeps it alive across any test that needs them
> > > > > * Only spins it up if there are tests running that do need them.
> Only
> > > > spins
> > > > > up the necessary components.
> > > > > * Lower our build times.
> > > > > * Interaction with components remains primarily through the same
> > > methods
> > > > > you'd normally interact (you can build producers/consumers, or
> > > whatever).
> > > > > * IMO, easier to explain and have people use.  I've had a couple
> > people
> > > > > pretty easily pick up on it.
> > > > >
> > > > > I can't share the code, but essentially it looks like (And I'm sure
> > > email
> > > > > will butcher this, so I'm sorry in advance):
> > > > > @ExtendWith({<ExtensionOne.class, ExtensionTwo.class, ...})
> > > > > ...
> > > > > @BeforeAll
> > > > > public static void beforeAll() {
> > > > >    // Test specific setup, similar to before. But without the
> cluster
> > > > > creation work.
> > > > > }
> > > > >
> > > > > Everything else gets handled under the covers, with the caveat that
> > > what
> > > > I
> > > > > needed to do was less involved than some of our tests, so there may
> > be
> > > > some
> > > > > experimenting.
> > > > >
> > > > > On Wed, May 1, 2019 at 10:59 AM Justin Leet <justinjl...@gmail.com
> >
> > > > wrote:
> > > > >
> > > > > > Hi all,
> > > > > >
> > > > > > I wanted to start a discussion on something near and dear to all
> of
> > > our
> > > > > > hearts: The role of full-dev in our testing cycle.
> > > > > >
> > > > > > Right now, we require all PRs to have spun up the full-dev
> > > environment
> > > > > and
> > > > > > make sure that things flow through properly. In some cases, this
> > is a
> > > > > > necessity, and in other cases, it's a fairly large burden on
> > current
> > > > and
> > > > > > potential contributors.
> > > > > >
> > > > > > So what do we need to do to reduce our dependence on full dev and
> > > > > increase
> > > > > > our confidence in our CI process?
> > > > > >
> > > > > > My initial thoughts on this encompass a few things
> > > > > > * Increasing our ability to easily write automated tests. In
> > > > particular,
> > > > > I
> > > > > > think our integration testing is fairly hard and has some
> > structural
> > > > > issues
> > > > > > (e.g. our tests spin up components per Test class, not for the
> > > overall
> > > > > test
> > > > > > suite being run, which also increases build times, etc.). How do
> we
> > > > make
> > > > > > this easier and have less boilerplate?  I have one potential idea
> > > I'll
> > > > > > reply to this thread with.
> > > > > > * Unit tests in general. I'd argue that any area we thing need
> > > full-dev
> > > > > > spun up to be confident in needs more testing. Does anyone have
> any
> > > > > > opinions on which areas we have the least confidence in?  Which
> > areas
> > > > of
> > > > > > code are people afraid to touch?
> > > > > > * Our general procedures. Should we not be requiring full-dev on
> > all
> > > > PRs,
> > > > > > but maybe just asking for a justification why not (e.g. "I don't
> > need
> > > > > > full-dev for this, because I added unit tests here and here and
> > > > > integration
> > > > > > tests for the these components?"). And then a reviewer can
> request
> > a
> > > > > > full-dev spin up if needed?  Could we stage this rollout (e.g.
> > > > > > metron-platform modules require it, but others don't, etc.?)
> > This'll
> > > > add
> > > > > > more pressure onto the release phase, so maybe that involves
> > fleshing
> > > > > out a
> > > > > > checklist of critical path items to check.
> > > > > > * Do we need to improve our docs? Publish Javadocs? After all, if
> > > docs
> > > > > are
> > > > > > better, hopefully we'd see less issues introduced and be more
> > > confident
> > > > > in
> > > > > > changes coming in.
> > > > > > * What environment do we even want testing to occur in?
> > > > > > https://github.com/apache/metron/pull/1261 has seen a lot of
> work,
> > > and
> > > > > > getting it across the finish line may help make the dev
> environment
> > > > less
> > > > > > onerous, even if still needed.
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to