I'm exploring the use of TestContainers right now as part of the HDP 3.1
effort.  Still exploring feasibility, but it is looking promising.

On Fri, May 3, 2019 at 10:46 AM Justin Leet <justinjl...@gmail.com> wrote:

> I think everything Casey mentioned is a good call-out as things start to
> build into specifics. I definitely agree it's a very nontrivial amount of
> work, but that lowering the barrier of entry to a lot of PRs eases the
> burden on both new and existing contributors by a substantial amount.
>
> @Mike,
> As a heads up, I (super briefly) looked into the Docker stuff a bit, and
> the extension idea may not work with the Docker stuff to the extent we want
> (at least without us doing some additional work ourselves).  It seems like
> at least what I linked earlier and some other stuff actually provide direct
> annotations rather than plugging directly into the same extensions idea.
>
> Before we dive into it too much, it might be worth playing around with it
> more and coming back to the group with a couple options. If you're
> interested in looking into it, I *suspect* it'll boil down to a couple
> options
> * Use Docker with something like the above link or testcontainers
> <https://www.testcontainers.org/>. It's possible the Docker stuff ends up
> being lightweight / fast enough to use on at least a per class basis like
> we do now, rather than trying to generalize across all tests immediately.
> * Roll our own code to have more fine-grained control over the Docker
> lifecycle and which components need to be spun up for extensions.
> * Figure out how to make the prior options play nice
> * Other
>
> I'll probably dig a bit on my own, but I'm not sure how much focused time
> I'll be able to put into it in the absolute immediate term.  I can probably
> whip up a quick demo of the extensions stuff over the next week or so in a
> one-off project to give a bit of a demo and maybe help out with some of
> experimentation. Feel free to reach out if there's anything in particular
> that would be helpful to look into.
>
> The extensions stuff does have a lot of benefits, but I had less components
> to work with and didn't have the same classpath worries that made real
> instances (i.e. Docker) more attractive. It was sufficient for our
> purposes, but there might have to be compromises here. We depend on a lot
> more of the Hadoop stack.
>
> Migrating to JUnit 5 in general *should* be pretty easy. I don't think we
> really use any of the stuff that migrated in odd ways, so it should mostly
> just be updating annotations and imports (@Before to @BeforeEach, etc.).
> I'm sure this glosses over at least a few gotchas, though.
>
> On Fri, May 3, 2019 at 10:09 AM Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > I didn't get a chance to say so earlier, but Justin, I also like the
> JUnit
> > 5 extension suggestion. I've gone through some en-masse changes before,
> > e.g. standardizing the log4j construction idiom, and it honestly wasn't
> too
> > bad. Just a thought, it might make sense to kick this off by upgrading
> > overall JUnit 4 to 5 across the code base, and then diving into some of
> the
> > more 5-specific changes you're recommending as needed. I created this
> Jira
> > a bit ago - https://issues.apache.org/jira/browse/METRON-2037. That was
> to
> > upgrade to 4.13, but we might be able to kill 2 birds with one stone if
> we
> > go to JUnit 5. I'm volunteering to look into this and/or see the work
> > through to completion. What do you think?
> >
> > > - debuggability (right now we run the tests in the same JVM and setting
> >    breakpoints is trivial, even in the innards of Hadoop.  This is very
> >    valuable for figuring out what's going wrong and we'll need SOME
> > solution
> >    for it)
> >
> > Yeah Casey, I remember this from the last time we discussed it. That's
> the
> > most import issue to be sure we have a handle on, imo. We'll need to
> figure
> > out remote debugging in Docker containers. Not to mention, the execution
> > path becomes a bit more spread out when we're running multiple components
> > as nature intended across multiple processes.
> >
> >
> >
> > On Fri, May 3, 2019 at 7:14 AM Casey Stella <ceste...@gmail.com> wrote:
> >
> > > I just want to chime in and say I'm STRONGLY in favor of a docker-based
> > > approach to testing (I specifically like the JUnit 5 extensions
> > > suggestion).  I think that forcing a full-dev evaluation for every
> small
> > PR
> > > is a barrier to entry that I'd like to overcome.  I also think that
> this
> > is
> > > going to not be trivial.
> > >
> > > There will be weirdness/drama with:
> > >
> > >    - cleanup
> > >    - setup in situations where multi-components are used
> > >    - debuggability (right now we run the tests in the same JVM and
> > setting
> > >    breakpoints is trivial, even in the innards of Hadoop.  This is very
> > >    valuable for figuring out what's going wrong and we'll need SOME
> > > solution
> > >    for it)
> > >    - possible resource limitations in travis for running tests with
> > >    multiple components
> > >
> > > Even so, with ALL of that being said, I still think the value outweighs
> > the
> > > difficulty by a factor of 10.  Being able to be confident after a
> travis
> > > run that people aren't introducing subtle classpath or cross-component
> > > interaction issues would open up 80% of the class of PRs that don't
> > require
> > > full-dev review.  That being said, I still don't think it's 100%.
> > > Specifically, PRs which can credibly be argued that they touch
> > installation
> > > pathways would still need to be verified in full-dev as it's the only
> > path
> > > to validating that (otherwise we would be regressing in test coverage).
> > >
> > > On Wed, May 1, 2019 at 9:33 PM Justin Leet <justinjl...@gmail.com>
> > wrote:
> > >
> > > > >
> > > > > My impression is that this is already the status quo. But, if we
> > think
> > > we
> > > > > need to be more clear on this, let's put up a vote to change the
> > coding
> > > > > guidelines and PR checklist. I've done this many times in the past,
> > the
> > > > > most obvious instances are when I've made doc changes or unit test
> > > > > modifications because those will not impact full dev. I will own
> this
> > > > item.
> > > > > I think it can probably get rolled in with my dev guideline changes
> > for
> > > > > architecture diagrams.
> > > >
> > > >
> > > > For completeness in our PR checklist: "- [ ] Have you verified the
> > basic
> > > > functionality of the build by building and running locally with
> Vagrant
> > > > full-dev environment or the equivalent?"  In practice, you're right,
> > but
> > > > any newer contributors aren't necessarily going to know this.
> > > >
> > > > 1. I think we should create Jiras with the end deliverable being that
> > our
> > > > > private vs public API endpoints are clearly delineated. From there,
> > we
> > > > > create another round of javadoc - for the public APIs let the
> > javadocs
> > > > rain
> > > > > from the heavens to your heart's content. It's for public
> consumption
> > > and
> > > > > should assist end users. See Mockito, for example -
> > > > >
> > > > >
> > > >
> > >
> >
> https://static.javadoc.io/org.mockito/mockito-core/2.27.0/org/mockito/Mockito.html
> > > > > .
> > > > > For developer docs, I'm of the *extremely strong* opinion that this
> > > > should
> > > > > be limited. Emphasize module, package, class, and method naming
> > > > conventions
> > > > > over all else. If it doesn't make sense just reading the code,
> take a
> > > > > minute to summarize what you're doing and consider refactoring. For
> > > > > legitimately more complex and necessary code passages, add a note.
> > For
> > > > > multi-class interactions that provide a larger story arc, add dev
> > docs
> > > to
> > > > > the relevant READMEs. Our use of Zookeeper Curator and its
> > interaction
> > > > with
> > > > > our topology config loading is a perfect example of a feature that
> > > would
> > > > > fit this need.
> > > > >
> > > > 2. I'm an immediate -1 on any documentation that looks like " /**
> Open
> > > the
> > > > > car door **/ public void openCarDoor() {...}" :-). The code speaks
> > for
> > > > > itself.
> > > > > 3. Publish javadocs for public APIs, not our internal dev APIs. Let
> > > your
> > > > > fav IDE fill in the gaps for devs.
> > > >
> > > >
> > > > I'm +1 on delineating public vs private APIs like you've outlined
> > > there.  I
> > > > think our dev stuff is *better* than our general usage guides, but
> > > there's
> > > > room for improvement. I'm fairly agnostic on the dev docs because to
> be
> > > > honest, a ton of our older code is not at all self explanatory, and
> to
> > be
> > > > blunt refactoring a lot of it is a substantial lift (as we've all
> seen
> > > > multiple times trying to refactor it).  If this were greenfield, I'd
> be
> > > in
> > > > much stronger agreement with you, but I suspect in practice there's a
> > lot
> > > > of stuff nobody's going to refactor for awhile.
> > > >
> > > >
> > > > > Full dev until we vote to replace the existing setup and can be
> > > confident
> > > > > that the new approach 1. is stable, 2. takes <= the amount of time
> to
> > > > > complete as full dev. I am +1 for migrating towards this approach
> and
> > > > think
> > > > > we should do so when it's dialed in.
> > > >
> > > >
> > > > Great, I look forward to that getting in.
> > > >
> > > > Justin, what are your thoughts on leveraging this approach along with
> > > > > long-lived Docker containers?
> > > > >
> > > >
> > > > Apparently, there's actually an extension for running Docker
> > containers,
> > > > see  https://faustxvi.github.io/junit5-docker/.  My main hesitation
> > > there
> > > > is more around how much effort to migrate it is. I think that's
> almost
> > > > certainly a cleaner long term solution, but I suspect the 80%
> solution
> > of
> > > > migrating what we have *might* be easier.  There might also be ways
> of
> > > just
> > > > leveraging this by moving stuff over to failsafe from surefire, but I
> > > > really don't know enough about it.
> > > >
> > > >  Clean/reset component state after test method and/or test class
> > > > >    completion
> > > >
> > > >
> > > > Probably doable regardless of approach, but things can take
> nontrivial
> > > > amounts of time to clean up (e.g. topic deletion or clearing). I
> > believe
> > > we
> > > > do this right now, so I'd expect to continue with that.
> > > >
> > > >
> > > > On Wed, May 1, 2019 at 8:48 PM Michael Miklavcic <
> > > > michael.miklav...@gmail.com> wrote:
> > > >
> > > > > Justin, what are your thoughts on leveraging this approach along
> with
> > > > > long-lived Docker containers? I think the lifecycle would look
> like:
> > > > >
> > > > >    1. I need components A, B, C
> > > > >    2. If not started, start A, B, C
> > > > >    3. If started, clean/reset it
> > > > >    4. Setup pre-test state
> > > > >    5. Run test(s)
> > > > >    6. Clean/reset component state after test method and/or test
> class
> > > > >    completion (we may consider nuking this step - I assisted in
> > > designing
> > > > > an
> > > > >    acceptance testing framework in the past where we handled
> cleanup
> > on
> > > > the
> > > > >    front end. This meant that remediation for failed tests became
> > > simpler
> > > > >    because I still had the state from the failed test run)
> > > > >
> > > > > Stopping/removing the containers could be a manual process or just
> a
> > > > simple
> > > > > script in the project root. We could also add a special module that
> > > runs
> > > > > last that handles shutting down containers, if desired.
> > > > >
> > > > > I know this has been a perennial thread for us. I can dig up the
> > > original
> > > > > discuss threads on this as well if folks think they're still
> > pertinent,
> > > > but
> > > > > I think we're pretty far removed now from that original point in
> time
> > > and
> > > > > should just move forward with fresh perspectives.
> > > > >
> > > > > On Wed, May 1, 2019 at 9:21 AM Justin Leet <justinjl...@gmail.com>
> > > > wrote:
> > > > >
> > > > > > Re: the integration testing point above, a strategy I used
> recently
> > > to
> > > > > > alleviate a similar problem was to exploit JUnit 5's
> extensions.  I
> > > > > haven't
> > > > > > played with this at all in Metron's code, so 1) Take this with a
> > > > several
> > > > > > grains of salt and 2) Feel encouraged to point out anything
> > > > > > wrong/problematic/ludicrous.  My use case was pretty tame and
> easy
> > to
> > > > > > implement.
> > > > > >
> > > > > > Essentially, you can set up an extension backed by a singleton
> > > cluster
> > > > > (in
> > > > > > the case I was doing, I had two extensions, one for a Kafka
> cluster
> > > and
> > > > > one
> > > > > > for MiniDfs).  The backing clusters expose some methods (i.e.
> > create
> > > > > topic,
> > > > > > get brokers, get file system, etc.), which lets the tests classes
> > > setup
> > > > > > whatever they need. Classes that need them just use an annotation
> > and
> > > > can
> > > > > > be setup under the hood to init only when tests in the current
> run
> > > > suite
> > > > > > actually use them and spin down after all are done. This saved ~1
> > min
> > > > on
> > > > > a
> > > > > > 4 minute build.  It ends up being a pretty clean way to use them,
> > > > > although
> > > > > > we might need to be a bit more sophisticated, since my case was
> > less
> > > > > > complicated. The main messiness is that this necessarily invites
> > > tests
> > > > to
> > > > > > interfere with each other (i.e. if multiple tests use the kafka
> > topic
> > > > > > "test", problems will be involved). We might be able to improve
> on
> > > > this.
> > > > > >
> > > > > > We could likely do something similar with our InMemoryComponents.
> > > Right
> > > > > now
> > > > > > these are spun up on a per-test basis, rather than the overall
> > suite.
> > > > > This
> > > > > > involves some setup in any class that wants them, just to be able
> > to
> > > > get
> > > > > a
> > > > > > cluster.  There's also some definition of spin up order and so
> on,
> > > > which
> > > > > is
> > > > > > already taken care of with the extensions (declaration order).
> The
> > > > other
> > > > > > catch in our case is that we have way more code covered by JUnit
> 4,
> > > > which
> > > > > > doesn't have the same capabilities.  That migration doesn't
> > > necessarily
> > > > > > have to be done all at once, but it is a potential substantial
> pain
> > > > > point.
> > > > > >
> > > > > > Basically, I'd hope to push management of our abstraction further
> > > back
> > > > > into
> > > > > > actual JUnit so they're get more reuse across runs and it's
> easily
> > > > > > understood what a test needs and uses right up front.  In our
> case,
> > > the
> > > > > > InMemoryComponents likely map to extensions.
> > > > > >
> > > > > > If we did something like this, it potentially solves a few
> problems
> > > > > > * Makes it easy to build an integration test that says "Give me
> > these
> > > > > > components".
> > > > > > * Keeps it alive across any test that needs them
> > > > > > * Only spins it up if there are tests running that do need them.
> > Only
> > > > > spins
> > > > > > up the necessary components.
> > > > > > * Lower our build times.
> > > > > > * Interaction with components remains primarily through the same
> > > > methods
> > > > > > you'd normally interact (you can build producers/consumers, or
> > > > whatever).
> > > > > > * IMO, easier to explain and have people use.  I've had a couple
> > > people
> > > > > > pretty easily pick up on it.
> > > > > >
> > > > > > I can't share the code, but essentially it looks like (And I'm
> sure
> > > > email
> > > > > > will butcher this, so I'm sorry in advance):
> > > > > > @ExtendWith({<ExtensionOne.class, ExtensionTwo.class, ...})
> > > > > > ...
> > > > > > @BeforeAll
> > > > > > public static void beforeAll() {
> > > > > >    // Test specific setup, similar to before. But without the
> > cluster
> > > > > > creation work.
> > > > > > }
> > > > > >
> > > > > > Everything else gets handled under the covers, with the caveat
> that
> > > > what
> > > > > I
> > > > > > needed to do was less involved than some of our tests, so there
> may
> > > be
> > > > > some
> > > > > > experimenting.
> > > > > >
> > > > > > On Wed, May 1, 2019 at 10:59 AM Justin Leet <
> justinjl...@gmail.com
> > >
> > > > > wrote:
> > > > > >
> > > > > > > Hi all,
> > > > > > >
> > > > > > > I wanted to start a discussion on something near and dear to
> all
> > of
> > > > our
> > > > > > > hearts: The role of full-dev in our testing cycle.
> > > > > > >
> > > > > > > Right now, we require all PRs to have spun up the full-dev
> > > > environment
> > > > > > and
> > > > > > > make sure that things flow through properly. In some cases,
> this
> > > is a
> > > > > > > necessity, and in other cases, it's a fairly large burden on
> > > current
> > > > > and
> > > > > > > potential contributors.
> > > > > > >
> > > > > > > So what do we need to do to reduce our dependence on full dev
> and
> > > > > > increase
> > > > > > > our confidence in our CI process?
> > > > > > >
> > > > > > > My initial thoughts on this encompass a few things
> > > > > > > * Increasing our ability to easily write automated tests. In
> > > > > particular,
> > > > > > I
> > > > > > > think our integration testing is fairly hard and has some
> > > structural
> > > > > > issues
> > > > > > > (e.g. our tests spin up components per Test class, not for the
> > > > overall
> > > > > > test
> > > > > > > suite being run, which also increases build times, etc.). How
> do
> > we
> > > > > make
> > > > > > > this easier and have less boilerplate?  I have one potential
> idea
> > > > I'll
> > > > > > > reply to this thread with.
> > > > > > > * Unit tests in general. I'd argue that any area we thing need
> > > > full-dev
> > > > > > > spun up to be confident in needs more testing. Does anyone have
> > any
> > > > > > > opinions on which areas we have the least confidence in?  Which
> > > areas
> > > > > of
> > > > > > > code are people afraid to touch?
> > > > > > > * Our general procedures. Should we not be requiring full-dev
> on
> > > all
> > > > > PRs,
> > > > > > > but maybe just asking for a justification why not (e.g. "I
> don't
> > > need
> > > > > > > full-dev for this, because I added unit tests here and here and
> > > > > > integration
> > > > > > > tests for the these components?"). And then a reviewer can
> > request
> > > a
> > > > > > > full-dev spin up if needed?  Could we stage this rollout (e.g.
> > > > > > > metron-platform modules require it, but others don't, etc.?)
> > > This'll
> > > > > add
> > > > > > > more pressure onto the release phase, so maybe that involves
> > > fleshing
> > > > > > out a
> > > > > > > checklist of critical path items to check.
> > > > > > > * Do we need to improve our docs? Publish Javadocs? After all,
> if
> > > > docs
> > > > > > are
> > > > > > > better, hopefully we'd see less issues introduced and be more
> > > > confident
> > > > > > in
> > > > > > > changes coming in.
> > > > > > > * What environment do we even want testing to occur in?
> > > > > > > https://github.com/apache/metron/pull/1261 has seen a lot of
> > work,
> > > > and
> > > > > > > getting it across the finish line may help make the dev
> > environment
> > > > > less
> > > > > > > onerous, even if still needed.
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to