[DISCUSS] Full-dev role in PR testign

2019-05-01 Thread Justin Leet
Hi all,

I wanted to start a discussion on something near and dear to all of our
hearts: The role of full-dev in our testing cycle.

Right now, we require all PRs to have spun up the full-dev environment and
make sure that things flow through properly. In some cases, this is a
necessity, and in other cases, it's a fairly large burden on current and
potential contributors.

So what do we need to do to reduce our dependence on full dev and increase
our confidence in our CI process?

My initial thoughts on this encompass a few things
* Increasing our ability to easily write automated tests. In particular, I
think our integration testing is fairly hard and has some structural issues
(e.g. our tests spin up components per Test class, not for the overall test
suite being run, which also increases build times, etc.). How do we make
this easier and have less boilerplate?  I have one potential idea I'll
reply to this thread with.
* Unit tests in general. I'd argue that any area we thing need full-dev
spun up to be confident in needs more testing. Does anyone have any
opinions on which areas we have the least confidence in?  Which areas of
code are people afraid to touch?
* Our general procedures. Should we not be requiring full-dev on all PRs,
but maybe just asking for a justification why not (e.g. "I don't need
full-dev for this, because I added unit tests here and here and integration
tests for the these components?"). And then a reviewer can request a
full-dev spin up if needed?  Could we stage this rollout (e.g.
metron-platform modules require it, but others don't, etc.?) This'll add
more pressure onto the release phase, so maybe that involves fleshing out a
checklist of critical path items to check.
* Do we need to improve our docs? Publish Javadocs? After all, if docs are
better, hopefully we'd see less issues introduced and be more confident in
changes coming in.
* What environment do we even want testing to occur in?
https://github.com/apache/metron/pull/1261 has seen a lot of work, and
getting it across the finish line may help make the dev environment less
onerous, even if still needed.


Re: [DISCUSS] Full-dev role in PR testign

2019-05-01 Thread Justin Leet
Re: the integration testing point above, a strategy I used recently to
alleviate a similar problem was to exploit JUnit 5's extensions.  I haven't
played with this at all in Metron's code, so 1) Take this with a several
grains of salt and 2) Feel encouraged to point out anything
wrong/problematic/ludicrous.  My use case was pretty tame and easy to
implement.

Essentially, you can set up an extension backed by a singleton cluster (in
the case I was doing, I had two extensions, one for a Kafka cluster and one
for MiniDfs).  The backing clusters expose some methods (i.e. create topic,
get brokers, get file system, etc.), which lets the tests classes setup
whatever they need. Classes that need them just use an annotation and can
be setup under the hood to init only when tests in the current run suite
actually use them and spin down after all are done. This saved ~1 min on a
4 minute build.  It ends up being a pretty clean way to use them, although
we might need to be a bit more sophisticated, since my case was less
complicated. The main messiness is that this necessarily invites tests to
interfere with each other (i.e. if multiple tests use the kafka topic
"test", problems will be involved). We might be able to improve on this.

We could likely do something similar with our InMemoryComponents. Right now
these are spun up on a per-test basis, rather than the overall suite. This
involves some setup in any class that wants them, just to be able to get a
cluster.  There's also some definition of spin up order and so on, which is
already taken care of with the extensions (declaration order).  The other
catch in our case is that we have way more code covered by JUnit 4, which
doesn't have the same capabilities.  That migration doesn't necessarily
have to be done all at once, but it is a potential substantial pain point.

Basically, I'd hope to push management of our abstraction further back into
actual JUnit so they're get more reuse across runs and it's easily
understood what a test needs and uses right up front.  In our case, the
InMemoryComponents likely map to extensions.

If we did something like this, it potentially solves a few problems
* Makes it easy to build an integration test that says "Give me these
components".
* Keeps it alive across any test that needs them
* Only spins it up if there are tests running that do need them. Only spins
up the necessary components.
* Lower our build times.
* Interaction with components remains primarily through the same methods
you'd normally interact (you can build producers/consumers, or whatever).
* IMO, easier to explain and have people use.  I've had a couple people
pretty easily pick up on it.

I can't share the code, but essentially it looks like (And I'm sure email
will butcher this, so I'm sorry in advance):
@ExtendWith({ wrote:

> Hi all,
>
> I wanted to start a discussion on something near and dear to all of our
> hearts: The role of full-dev in our testing cycle.
>
> Right now, we require all PRs to have spun up the full-dev environment and
> make sure that things flow through properly. In some cases, this is a
> necessity, and in other cases, it's a fairly large burden on current and
> potential contributors.
>
> So what do we need to do to reduce our dependence on full dev and increase
> our confidence in our CI process?
>
> My initial thoughts on this encompass a few things
> * Increasing our ability to easily write automated tests. In particular, I
> think our integration testing is fairly hard and has some structural issues
> (e.g. our tests spin up components per Test class, not for the overall test
> suite being run, which also increases build times, etc.). How do we make
> this easier and have less boilerplate?  I have one potential idea I'll
> reply to this thread with.
> * Unit tests in general. I'd argue that any area we thing need full-dev
> spun up to be confident in needs more testing. Does anyone have any
> opinions on which areas we have the least confidence in?  Which areas of
> code are people afraid to touch?
> * Our general procedures. Should we not be requiring full-dev on all PRs,
> but maybe just asking for a justification why not (e.g. "I don't need
> full-dev for this, because I added unit tests here and here and integration
> tests for the these components?"). And then a reviewer can request a
> full-dev spin up if needed?  Could we stage this rollout (e.g.
> metron-platform modules require it, but others don't, etc.?) This'll add
> more pressure onto the release phase, so maybe that involves fleshing out a
> checklist of critical path items to check.
> * Do we need to improve our docs? Publish Javadocs? After all, if docs are
> better, hopefully we'd see less issues introduced and be more confident in
> changes coming in.
> * What environment do we even want testing to occur in?
> https://github.com/apache/metron/pull/1261 has seen a lot of work, and
> getting it across the finish line may help make the dev environment less
> onerous

Re: [DISCUSS] Full-dev role in PR testign

2019-05-01 Thread Michael Miklavcic
> Our general procedures. Should we not be requiring full-dev on all PRs,
but maybe just asking for a justification why not (e.g. "I don't need
full-dev for this, because I added unit tests here and here and integration
tests for the these components?"). And then a reviewer can request a
full-dev spin up if needed?  Could we stage this rollout (e.g.
metron-platform modules require it, but others don't, etc.?) This'll add
more pressure onto the release phase, so maybe that involves fleshing out a
checklist of critical path items to check.

My impression is that this is already the status quo. But, if we think we
need to be more clear on this, let's put up a vote to change the coding
guidelines and PR checklist. I've done this many times in the past, the
most obvious instances are when I've made doc changes or unit test
modifications because those will not impact full dev. I will own this item.
I think it can probably get rolled in with my dev guideline changes for
architecture diagrams.

> Do we need to improve our docs? Publish Javadocs? After all, if docs are
better, hopefully we'd see less issues introduced and be more confident in
changes coming in.

1. I think we should create Jiras with the end deliverable being that our
private vs public API endpoints are clearly delineated. From there, we
create another round of javadoc - for the public APIs let the javadocs rain
from the heavens to your heart's content. It's for public consumption and
should assist end users. See Mockito, for example -
https://static.javadoc.io/org.mockito/mockito-core/2.27.0/org/mockito/Mockito.html.
For developer docs, I'm of the *extremely strong* opinion that this should
be limited. Emphasize module, package, class, and method naming conventions
over all else. If it doesn't make sense just reading the code, take a
minute to summarize what you're doing and consider refactoring. For
legitimately more complex and necessary code passages, add a note. For
multi-class interactions that provide a larger story arc, add dev docs to
the relevant READMEs. Our use of Zookeeper Curator and its interaction with
our topology config loading is a perfect example of a feature that would
fit this need.
2. I'm an immediate -1 on any documentation that looks like " /** Open the
car door **/ public void openCarDoor() {...}" :-). The code speaks for
itself.
3. Publish javadocs for public APIs, not our internal dev APIs. Let your
fav IDE fill in the gaps for devs.

>  What environment do we even want testing to occur in?
https://github.com/apache/metron/pull/1261 has seen a lot of work, and
getting it across the finish line may help make the dev environment less
onerous, even if still needed.

Full dev until we vote to replace the existing setup and can be confident
that the new approach 1. is stable, 2. takes <= the amount of time to
complete as full dev. I am +1 for migrating towards this approach and think
we should do so when it's dialed in.


On Wed, May 1, 2019 at 8:59 AM Justin Leet  wrote:

> Hi all,
>
> I wanted to start a discussion on something near and dear to all of our
> hearts: The role of full-dev in our testing cycle.
>
> Right now, we require all PRs to have spun up the full-dev environment and
> make sure that things flow through properly. In some cases, this is a
> necessity, and in other cases, it's a fairly large burden on current and
> potential contributors.
>
> So what do we need to do to reduce our dependence on full dev and increase
> our confidence in our CI process?
>
> My initial thoughts on this encompass a few things
> * Increasing our ability to easily write automated tests. In particular, I
> think our integration testing is fairly hard and has some structural issues
> (e.g. our tests spin up components per Test class, not for the overall test
> suite being run, which also increases build times, etc.). How do we make
> this easier and have less boilerplate?  I have one potential idea I'll
> reply to this thread with.
> * Unit tests in general. I'd argue that any area we thing need full-dev
> spun up to be confident in needs more testing. Does anyone have any
> opinions on which areas we have the least confidence in?  Which areas of
> code are people afraid to touch?
> * Our general procedures. Should we not be requiring full-dev on all PRs,
> but maybe just asking for a justification why not (e.g. "I don't need
> full-dev for this, because I added unit tests here and here and integration
> tests for the these components?"). And then a reviewer can request a
> full-dev spin up if needed?  Could we stage this rollout (e.g.
> metron-platform modules require it, but others don't, etc.?) This'll add
> more pressure onto the release phase, so maybe that involves fleshing out a
> checklist of critical path items to check.
> * Do we need to improve our docs? Publish Javadocs? After all, if docs are
> better, hopefully we'd see less issues introduced and be more confident in
> changes coming in.
> * What environment do we even

Re: [DISCUSS] Full-dev role in PR testign

2019-05-01 Thread Michael Miklavcic
Justin, what are your thoughts on leveraging this approach along with
long-lived Docker containers? I think the lifecycle would look like:

   1. I need components A, B, C
   2. If not started, start A, B, C
   3. If started, clean/reset it
   4. Setup pre-test state
   5. Run test(s)
   6. Clean/reset component state after test method and/or test class
   completion (we may consider nuking this step - I assisted in designing an
   acceptance testing framework in the past where we handled cleanup on the
   front end. This meant that remediation for failed tests became simpler
   because I still had the state from the failed test run)

Stopping/removing the containers could be a manual process or just a simple
script in the project root. We could also add a special module that runs
last that handles shutting down containers, if desired.

I know this has been a perennial thread for us. I can dig up the original
discuss threads on this as well if folks think they're still pertinent, but
I think we're pretty far removed now from that original point in time and
should just move forward with fresh perspectives.

On Wed, May 1, 2019 at 9:21 AM Justin Leet  wrote:

> Re: the integration testing point above, a strategy I used recently to
> alleviate a similar problem was to exploit JUnit 5's extensions.  I haven't
> played with this at all in Metron's code, so 1) Take this with a several
> grains of salt and 2) Feel encouraged to point out anything
> wrong/problematic/ludicrous.  My use case was pretty tame and easy to
> implement.
>
> Essentially, you can set up an extension backed by a singleton cluster (in
> the case I was doing, I had two extensions, one for a Kafka cluster and one
> for MiniDfs).  The backing clusters expose some methods (i.e. create topic,
> get brokers, get file system, etc.), which lets the tests classes setup
> whatever they need. Classes that need them just use an annotation and can
> be setup under the hood to init only when tests in the current run suite
> actually use them and spin down after all are done. This saved ~1 min on a
> 4 minute build.  It ends up being a pretty clean way to use them, although
> we might need to be a bit more sophisticated, since my case was less
> complicated. The main messiness is that this necessarily invites tests to
> interfere with each other (i.e. if multiple tests use the kafka topic
> "test", problems will be involved). We might be able to improve on this.
>
> We could likely do something similar with our InMemoryComponents. Right now
> these are spun up on a per-test basis, rather than the overall suite. This
> involves some setup in any class that wants them, just to be able to get a
> cluster.  There's also some definition of spin up order and so on, which is
> already taken care of with the extensions (declaration order).  The other
> catch in our case is that we have way more code covered by JUnit 4, which
> doesn't have the same capabilities.  That migration doesn't necessarily
> have to be done all at once, but it is a potential substantial pain point.
>
> Basically, I'd hope to push management of our abstraction further back into
> actual JUnit so they're get more reuse across runs and it's easily
> understood what a test needs and uses right up front.  In our case, the
> InMemoryComponents likely map to extensions.
>
> If we did something like this, it potentially solves a few problems
> * Makes it easy to build an integration test that says "Give me these
> components".
> * Keeps it alive across any test that needs them
> * Only spins it up if there are tests running that do need them. Only spins
> up the necessary components.
> * Lower our build times.
> * Interaction with components remains primarily through the same methods
> you'd normally interact (you can build producers/consumers, or whatever).
> * IMO, easier to explain and have people use.  I've had a couple people
> pretty easily pick up on it.
>
> I can't share the code, but essentially it looks like (And I'm sure email
> will butcher this, so I'm sorry in advance):
> @ExtendWith({ ...
> @BeforeAll
> public static void beforeAll() {
>// Test specific setup, similar to before. But without the cluster
> creation work.
> }
>
> Everything else gets handled under the covers, with the caveat that what I
> needed to do was less involved than some of our tests, so there may be some
> experimenting.
>
> On Wed, May 1, 2019 at 10:59 AM Justin Leet  wrote:
>
> > Hi all,
> >
> > I wanted to start a discussion on something near and dear to all of our
> > hearts: The role of full-dev in our testing cycle.
> >
> > Right now, we require all PRs to have spun up the full-dev environment
> and
> > make sure that things flow through properly. In some cases, this is a
> > necessity, and in other cases, it's a fairly large burden on current and
> > potential contributors.
> >
> > So what do we need to do to reduce our dependence on full dev and
> increase
> > our confidence in our CI process?
> >

Re: [DISCUSS] Full-dev role in PR testign

2019-05-01 Thread Justin Leet
>
> My impression is that this is already the status quo. But, if we think we
> need to be more clear on this, let's put up a vote to change the coding
> guidelines and PR checklist. I've done this many times in the past, the
> most obvious instances are when I've made doc changes or unit test
> modifications because those will not impact full dev. I will own this item.
> I think it can probably get rolled in with my dev guideline changes for
> architecture diagrams.


For completeness in our PR checklist: "- [ ] Have you verified the basic
functionality of the build by building and running locally with Vagrant
full-dev environment or the equivalent?"  In practice, you're right, but
any newer contributors aren't necessarily going to know this.

1. I think we should create Jiras with the end deliverable being that our
> private vs public API endpoints are clearly delineated. From there, we
> create another round of javadoc - for the public APIs let the javadocs rain
> from the heavens to your heart's content. It's for public consumption and
> should assist end users. See Mockito, for example -
>
> https://static.javadoc.io/org.mockito/mockito-core/2.27.0/org/mockito/Mockito.html
> .
> For developer docs, I'm of the *extremely strong* opinion that this should
> be limited. Emphasize module, package, class, and method naming conventions
> over all else. If it doesn't make sense just reading the code, take a
> minute to summarize what you're doing and consider refactoring. For
> legitimately more complex and necessary code passages, add a note. For
> multi-class interactions that provide a larger story arc, add dev docs to
> the relevant READMEs. Our use of Zookeeper Curator and its interaction with
> our topology config loading is a perfect example of a feature that would
> fit this need.
>
2. I'm an immediate -1 on any documentation that looks like " /** Open the
> car door **/ public void openCarDoor() {...}" :-). The code speaks for
> itself.
> 3. Publish javadocs for public APIs, not our internal dev APIs. Let your
> fav IDE fill in the gaps for devs.


I'm +1 on delineating public vs private APIs like you've outlined there.  I
think our dev stuff is *better* than our general usage guides, but there's
room for improvement. I'm fairly agnostic on the dev docs because to be
honest, a ton of our older code is not at all self explanatory, and to be
blunt refactoring a lot of it is a substantial lift (as we've all seen
multiple times trying to refactor it).  If this were greenfield, I'd be in
much stronger agreement with you, but I suspect in practice there's a lot
of stuff nobody's going to refactor for awhile.


> Full dev until we vote to replace the existing setup and can be confident
> that the new approach 1. is stable, 2. takes <= the amount of time to
> complete as full dev. I am +1 for migrating towards this approach and think
> we should do so when it's dialed in.


Great, I look forward to that getting in.

Justin, what are your thoughts on leveraging this approach along with
> long-lived Docker containers?
>

Apparently, there's actually an extension for running Docker containers,
see  https://faustxvi.github.io/junit5-docker/.  My main hesitation there
is more around how much effort to migrate it is. I think that's almost
certainly a cleaner long term solution, but I suspect the 80% solution of
migrating what we have *might* be easier.  There might also be ways of just
leveraging this by moving stuff over to failsafe from surefire, but I
really don't know enough about it.

 Clean/reset component state after test method and/or test class
>completion


Probably doable regardless of approach, but things can take nontrivial
amounts of time to clean up (e.g. topic deletion or clearing). I believe we
do this right now, so I'd expect to continue with that.


On Wed, May 1, 2019 at 8:48 PM Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> Justin, what are your thoughts on leveraging this approach along with
> long-lived Docker containers? I think the lifecycle would look like:
>
>1. I need components A, B, C
>2. If not started, start A, B, C
>3. If started, clean/reset it
>4. Setup pre-test state
>5. Run test(s)
>6. Clean/reset component state after test method and/or test class
>completion (we may consider nuking this step - I assisted in designing
> an
>acceptance testing framework in the past where we handled cleanup on the
>front end. This meant that remediation for failed tests became simpler
>because I still had the state from the failed test run)
>
> Stopping/removing the containers could be a manual process or just a simple
> script in the project root. We could also add a special module that runs
> last that handles shutting down containers, if desired.
>
> I know this has been a perennial thread for us. I can dig up the original
> discuss threads on this as well if folks think they're still pertinent, but
> I think we're pretty far removed now

Re: [DISCUSS] Full-dev role in PR testign

2019-05-03 Thread Casey Stella
I just want to chime in and say I'm STRONGLY in favor of a docker-based
approach to testing (I specifically like the JUnit 5 extensions
suggestion).  I think that forcing a full-dev evaluation for every small PR
is a barrier to entry that I'd like to overcome.  I also think that this is
going to not be trivial.

There will be weirdness/drama with:

   - cleanup
   - setup in situations where multi-components are used
   - debuggability (right now we run the tests in the same JVM and setting
   breakpoints is trivial, even in the innards of Hadoop.  This is very
   valuable for figuring out what's going wrong and we'll need SOME solution
   for it)
   - possible resource limitations in travis for running tests with
   multiple components

Even so, with ALL of that being said, I still think the value outweighs the
difficulty by a factor of 10.  Being able to be confident after a travis
run that people aren't introducing subtle classpath or cross-component
interaction issues would open up 80% of the class of PRs that don't require
full-dev review.  That being said, I still don't think it's 100%.
Specifically, PRs which can credibly be argued that they touch installation
pathways would still need to be verified in full-dev as it's the only path
to validating that (otherwise we would be regressing in test coverage).

On Wed, May 1, 2019 at 9:33 PM Justin Leet  wrote:

> >
> > My impression is that this is already the status quo. But, if we think we
> > need to be more clear on this, let's put up a vote to change the coding
> > guidelines and PR checklist. I've done this many times in the past, the
> > most obvious instances are when I've made doc changes or unit test
> > modifications because those will not impact full dev. I will own this
> item.
> > I think it can probably get rolled in with my dev guideline changes for
> > architecture diagrams.
>
>
> For completeness in our PR checklist: "- [ ] Have you verified the basic
> functionality of the build by building and running locally with Vagrant
> full-dev environment or the equivalent?"  In practice, you're right, but
> any newer contributors aren't necessarily going to know this.
>
> 1. I think we should create Jiras with the end deliverable being that our
> > private vs public API endpoints are clearly delineated. From there, we
> > create another round of javadoc - for the public APIs let the javadocs
> rain
> > from the heavens to your heart's content. It's for public consumption and
> > should assist end users. See Mockito, for example -
> >
> >
> https://static.javadoc.io/org.mockito/mockito-core/2.27.0/org/mockito/Mockito.html
> > .
> > For developer docs, I'm of the *extremely strong* opinion that this
> should
> > be limited. Emphasize module, package, class, and method naming
> conventions
> > over all else. If it doesn't make sense just reading the code, take a
> > minute to summarize what you're doing and consider refactoring. For
> > legitimately more complex and necessary code passages, add a note. For
> > multi-class interactions that provide a larger story arc, add dev docs to
> > the relevant READMEs. Our use of Zookeeper Curator and its interaction
> with
> > our topology config loading is a perfect example of a feature that would
> > fit this need.
> >
> 2. I'm an immediate -1 on any documentation that looks like " /** Open the
> > car door **/ public void openCarDoor() {...}" :-). The code speaks for
> > itself.
> > 3. Publish javadocs for public APIs, not our internal dev APIs. Let your
> > fav IDE fill in the gaps for devs.
>
>
> I'm +1 on delineating public vs private APIs like you've outlined there.  I
> think our dev stuff is *better* than our general usage guides, but there's
> room for improvement. I'm fairly agnostic on the dev docs because to be
> honest, a ton of our older code is not at all self explanatory, and to be
> blunt refactoring a lot of it is a substantial lift (as we've all seen
> multiple times trying to refactor it).  If this were greenfield, I'd be in
> much stronger agreement with you, but I suspect in practice there's a lot
> of stuff nobody's going to refactor for awhile.
>
>
> > Full dev until we vote to replace the existing setup and can be confident
> > that the new approach 1. is stable, 2. takes <= the amount of time to
> > complete as full dev. I am +1 for migrating towards this approach and
> think
> > we should do so when it's dialed in.
>
>
> Great, I look forward to that getting in.
>
> Justin, what are your thoughts on leveraging this approach along with
> > long-lived Docker containers?
> >
>
> Apparently, there's actually an extension for running Docker containers,
> see  https://faustxvi.github.io/junit5-docker/.  My main hesitation there
> is more around how much effort to migrate it is. I think that's almost
> certainly a cleaner long term solution, but I suspect the 80% solution of
> migrating what we have *might* be easier.  There might also be ways of just
> leveraging this by moving stuff o

Re: [DISCUSS] Full-dev role in PR testign

2019-05-03 Thread Michael Miklavcic
I didn't get a chance to say so earlier, but Justin, I also like the JUnit
5 extension suggestion. I've gone through some en-masse changes before,
e.g. standardizing the log4j construction idiom, and it honestly wasn't too
bad. Just a thought, it might make sense to kick this off by upgrading
overall JUnit 4 to 5 across the code base, and then diving into some of the
more 5-specific changes you're recommending as needed. I created this Jira
a bit ago - https://issues.apache.org/jira/browse/METRON-2037. That was to
upgrade to 4.13, but we might be able to kill 2 birds with one stone if we
go to JUnit 5. I'm volunteering to look into this and/or see the work
through to completion. What do you think?

> - debuggability (right now we run the tests in the same JVM and setting
   breakpoints is trivial, even in the innards of Hadoop.  This is very
   valuable for figuring out what's going wrong and we'll need SOME solution
   for it)

Yeah Casey, I remember this from the last time we discussed it. That's the
most import issue to be sure we have a handle on, imo. We'll need to figure
out remote debugging in Docker containers. Not to mention, the execution
path becomes a bit more spread out when we're running multiple components
as nature intended across multiple processes.



On Fri, May 3, 2019 at 7:14 AM Casey Stella  wrote:

> I just want to chime in and say I'm STRONGLY in favor of a docker-based
> approach to testing (I specifically like the JUnit 5 extensions
> suggestion).  I think that forcing a full-dev evaluation for every small PR
> is a barrier to entry that I'd like to overcome.  I also think that this is
> going to not be trivial.
>
> There will be weirdness/drama with:
>
>- cleanup
>- setup in situations where multi-components are used
>- debuggability (right now we run the tests in the same JVM and setting
>breakpoints is trivial, even in the innards of Hadoop.  This is very
>valuable for figuring out what's going wrong and we'll need SOME
> solution
>for it)
>- possible resource limitations in travis for running tests with
>multiple components
>
> Even so, with ALL of that being said, I still think the value outweighs the
> difficulty by a factor of 10.  Being able to be confident after a travis
> run that people aren't introducing subtle classpath or cross-component
> interaction issues would open up 80% of the class of PRs that don't require
> full-dev review.  That being said, I still don't think it's 100%.
> Specifically, PRs which can credibly be argued that they touch installation
> pathways would still need to be verified in full-dev as it's the only path
> to validating that (otherwise we would be regressing in test coverage).
>
> On Wed, May 1, 2019 at 9:33 PM Justin Leet  wrote:
>
> > >
> > > My impression is that this is already the status quo. But, if we think
> we
> > > need to be more clear on this, let's put up a vote to change the coding
> > > guidelines and PR checklist. I've done this many times in the past, the
> > > most obvious instances are when I've made doc changes or unit test
> > > modifications because those will not impact full dev. I will own this
> > item.
> > > I think it can probably get rolled in with my dev guideline changes for
> > > architecture diagrams.
> >
> >
> > For completeness in our PR checklist: "- [ ] Have you verified the basic
> > functionality of the build by building and running locally with Vagrant
> > full-dev environment or the equivalent?"  In practice, you're right, but
> > any newer contributors aren't necessarily going to know this.
> >
> > 1. I think we should create Jiras with the end deliverable being that our
> > > private vs public API endpoints are clearly delineated. From there, we
> > > create another round of javadoc - for the public APIs let the javadocs
> > rain
> > > from the heavens to your heart's content. It's for public consumption
> and
> > > should assist end users. See Mockito, for example -
> > >
> > >
> >
> https://static.javadoc.io/org.mockito/mockito-core/2.27.0/org/mockito/Mockito.html
> > > .
> > > For developer docs, I'm of the *extremely strong* opinion that this
> > should
> > > be limited. Emphasize module, package, class, and method naming
> > conventions
> > > over all else. If it doesn't make sense just reading the code, take a
> > > minute to summarize what you're doing and consider refactoring. For
> > > legitimately more complex and necessary code passages, add a note. For
> > > multi-class interactions that provide a larger story arc, add dev docs
> to
> > > the relevant READMEs. Our use of Zookeeper Curator and its interaction
> > with
> > > our topology config loading is a perfect example of a feature that
> would
> > > fit this need.
> > >
> > 2. I'm an immediate -1 on any documentation that looks like " /** Open
> the
> > > car door **/ public void openCarDoor() {...}" :-). The code speaks for
> > > itself.
> > > 3. Publish javadocs for public APIs, not our internal dev

Re: [DISCUSS] Full-dev role in PR testign

2019-05-03 Thread Justin Leet
I think everything Casey mentioned is a good call-out as things start to
build into specifics. I definitely agree it's a very nontrivial amount of
work, but that lowering the barrier of entry to a lot of PRs eases the
burden on both new and existing contributors by a substantial amount.

@Mike,
As a heads up, I (super briefly) looked into the Docker stuff a bit, and
the extension idea may not work with the Docker stuff to the extent we want
(at least without us doing some additional work ourselves).  It seems like
at least what I linked earlier and some other stuff actually provide direct
annotations rather than plugging directly into the same extensions idea.

Before we dive into it too much, it might be worth playing around with it
more and coming back to the group with a couple options. If you're
interested in looking into it, I *suspect* it'll boil down to a couple
options
* Use Docker with something like the above link or testcontainers
. It's possible the Docker stuff ends up
being lightweight / fast enough to use on at least a per class basis like
we do now, rather than trying to generalize across all tests immediately.
* Roll our own code to have more fine-grained control over the Docker
lifecycle and which components need to be spun up for extensions.
* Figure out how to make the prior options play nice
* Other

I'll probably dig a bit on my own, but I'm not sure how much focused time
I'll be able to put into it in the absolute immediate term.  I can probably
whip up a quick demo of the extensions stuff over the next week or so in a
one-off project to give a bit of a demo and maybe help out with some of
experimentation. Feel free to reach out if there's anything in particular
that would be helpful to look into.

The extensions stuff does have a lot of benefits, but I had less components
to work with and didn't have the same classpath worries that made real
instances (i.e. Docker) more attractive. It was sufficient for our
purposes, but there might have to be compromises here. We depend on a lot
more of the Hadoop stack.

Migrating to JUnit 5 in general *should* be pretty easy. I don't think we
really use any of the stuff that migrated in odd ways, so it should mostly
just be updating annotations and imports (@Before to @BeforeEach, etc.).
I'm sure this glosses over at least a few gotchas, though.

On Fri, May 3, 2019 at 10:09 AM Michael Miklavcic <
michael.miklav...@gmail.com> wrote:

> I didn't get a chance to say so earlier, but Justin, I also like the JUnit
> 5 extension suggestion. I've gone through some en-masse changes before,
> e.g. standardizing the log4j construction idiom, and it honestly wasn't too
> bad. Just a thought, it might make sense to kick this off by upgrading
> overall JUnit 4 to 5 across the code base, and then diving into some of the
> more 5-specific changes you're recommending as needed. I created this Jira
> a bit ago - https://issues.apache.org/jira/browse/METRON-2037. That was to
> upgrade to 4.13, but we might be able to kill 2 birds with one stone if we
> go to JUnit 5. I'm volunteering to look into this and/or see the work
> through to completion. What do you think?
>
> > - debuggability (right now we run the tests in the same JVM and setting
>breakpoints is trivial, even in the innards of Hadoop.  This is very
>valuable for figuring out what's going wrong and we'll need SOME
> solution
>for it)
>
> Yeah Casey, I remember this from the last time we discussed it. That's the
> most import issue to be sure we have a handle on, imo. We'll need to figure
> out remote debugging in Docker containers. Not to mention, the execution
> path becomes a bit more spread out when we're running multiple components
> as nature intended across multiple processes.
>
>
>
> On Fri, May 3, 2019 at 7:14 AM Casey Stella  wrote:
>
> > I just want to chime in and say I'm STRONGLY in favor of a docker-based
> > approach to testing (I specifically like the JUnit 5 extensions
> > suggestion).  I think that forcing a full-dev evaluation for every small
> PR
> > is a barrier to entry that I'd like to overcome.  I also think that this
> is
> > going to not be trivial.
> >
> > There will be weirdness/drama with:
> >
> >- cleanup
> >- setup in situations where multi-components are used
> >- debuggability (right now we run the tests in the same JVM and
> setting
> >breakpoints is trivial, even in the innards of Hadoop.  This is very
> >valuable for figuring out what's going wrong and we'll need SOME
> > solution
> >for it)
> >- possible resource limitations in travis for running tests with
> >multiple components
> >
> > Even so, with ALL of that being said, I still think the value outweighs
> the
> > difficulty by a factor of 10.  Being able to be confident after a travis
> > run that people aren't introducing subtle classpath or cross-component
> > interaction issues would open up 80% of the class of PRs that don't
> require

Re: [DISCUSS] Full-dev role in PR testign

2019-05-03 Thread Nick Allen
I'm exploring the use of TestContainers right now as part of the HDP 3.1
effort.  Still exploring feasibility, but it is looking promising.

On Fri, May 3, 2019 at 10:46 AM Justin Leet  wrote:

> I think everything Casey mentioned is a good call-out as things start to
> build into specifics. I definitely agree it's a very nontrivial amount of
> work, but that lowering the barrier of entry to a lot of PRs eases the
> burden on both new and existing contributors by a substantial amount.
>
> @Mike,
> As a heads up, I (super briefly) looked into the Docker stuff a bit, and
> the extension idea may not work with the Docker stuff to the extent we want
> (at least without us doing some additional work ourselves).  It seems like
> at least what I linked earlier and some other stuff actually provide direct
> annotations rather than plugging directly into the same extensions idea.
>
> Before we dive into it too much, it might be worth playing around with it
> more and coming back to the group with a couple options. If you're
> interested in looking into it, I *suspect* it'll boil down to a couple
> options
> * Use Docker with something like the above link or testcontainers
> . It's possible the Docker stuff ends up
> being lightweight / fast enough to use on at least a per class basis like
> we do now, rather than trying to generalize across all tests immediately.
> * Roll our own code to have more fine-grained control over the Docker
> lifecycle and which components need to be spun up for extensions.
> * Figure out how to make the prior options play nice
> * Other
>
> I'll probably dig a bit on my own, but I'm not sure how much focused time
> I'll be able to put into it in the absolute immediate term.  I can probably
> whip up a quick demo of the extensions stuff over the next week or so in a
> one-off project to give a bit of a demo and maybe help out with some of
> experimentation. Feel free to reach out if there's anything in particular
> that would be helpful to look into.
>
> The extensions stuff does have a lot of benefits, but I had less components
> to work with and didn't have the same classpath worries that made real
> instances (i.e. Docker) more attractive. It was sufficient for our
> purposes, but there might have to be compromises here. We depend on a lot
> more of the Hadoop stack.
>
> Migrating to JUnit 5 in general *should* be pretty easy. I don't think we
> really use any of the stuff that migrated in odd ways, so it should mostly
> just be updating annotations and imports (@Before to @BeforeEach, etc.).
> I'm sure this glosses over at least a few gotchas, though.
>
> On Fri, May 3, 2019 at 10:09 AM Michael Miklavcic <
> michael.miklav...@gmail.com> wrote:
>
> > I didn't get a chance to say so earlier, but Justin, I also like the
> JUnit
> > 5 extension suggestion. I've gone through some en-masse changes before,
> > e.g. standardizing the log4j construction idiom, and it honestly wasn't
> too
> > bad. Just a thought, it might make sense to kick this off by upgrading
> > overall JUnit 4 to 5 across the code base, and then diving into some of
> the
> > more 5-specific changes you're recommending as needed. I created this
> Jira
> > a bit ago - https://issues.apache.org/jira/browse/METRON-2037. That was
> to
> > upgrade to 4.13, but we might be able to kill 2 birds with one stone if
> we
> > go to JUnit 5. I'm volunteering to look into this and/or see the work
> > through to completion. What do you think?
> >
> > > - debuggability (right now we run the tests in the same JVM and setting
> >breakpoints is trivial, even in the innards of Hadoop.  This is very
> >valuable for figuring out what's going wrong and we'll need SOME
> > solution
> >for it)
> >
> > Yeah Casey, I remember this from the last time we discussed it. That's
> the
> > most import issue to be sure we have a handle on, imo. We'll need to
> figure
> > out remote debugging in Docker containers. Not to mention, the execution
> > path becomes a bit more spread out when we're running multiple components
> > as nature intended across multiple processes.
> >
> >
> >
> > On Fri, May 3, 2019 at 7:14 AM Casey Stella  wrote:
> >
> > > I just want to chime in and say I'm STRONGLY in favor of a docker-based
> > > approach to testing (I specifically like the JUnit 5 extensions
> > > suggestion).  I think that forcing a full-dev evaluation for every
> small
> > PR
> > > is a barrier to entry that I'd like to overcome.  I also think that
> this
> > is
> > > going to not be trivial.
> > >
> > > There will be weirdness/drama with:
> > >
> > >- cleanup
> > >- setup in situations where multi-components are used
> > >- debuggability (right now we run the tests in the same JVM and
> > setting
> > >breakpoints is trivial, even in the innards of Hadoop.  This is very
> > >valuable for figuring out what's going wrong and we'll need SOME
> > > solution
> > >for it)
> > >- possible resource 

Re: [DISCUSS] Full-dev role in PR testign

2019-05-03 Thread Michael Miklavcic
That's awesome, Nick! Looking forward to seeing how this works out.

On Fri, May 3, 2019 at 10:06 AM Nick Allen  wrote:

> I'm exploring the use of TestContainers right now as part of the HDP 3.1
> effort.  Still exploring feasibility, but it is looking promising.
>
> On Fri, May 3, 2019 at 10:46 AM Justin Leet  wrote:
>
> > I think everything Casey mentioned is a good call-out as things start to
> > build into specifics. I definitely agree it's a very nontrivial amount of
> > work, but that lowering the barrier of entry to a lot of PRs eases the
> > burden on both new and existing contributors by a substantial amount.
> >
> > @Mike,
> > As a heads up, I (super briefly) looked into the Docker stuff a bit, and
> > the extension idea may not work with the Docker stuff to the extent we
> want
> > (at least without us doing some additional work ourselves).  It seems
> like
> > at least what I linked earlier and some other stuff actually provide
> direct
> > annotations rather than plugging directly into the same extensions idea.
> >
> > Before we dive into it too much, it might be worth playing around with it
> > more and coming back to the group with a couple options. If you're
> > interested in looking into it, I *suspect* it'll boil down to a couple
> > options
> > * Use Docker with something like the above link or testcontainers
> > . It's possible the Docker stuff ends
> up
> > being lightweight / fast enough to use on at least a per class basis like
> > we do now, rather than trying to generalize across all tests immediately.
> > * Roll our own code to have more fine-grained control over the Docker
> > lifecycle and which components need to be spun up for extensions.
> > * Figure out how to make the prior options play nice
> > * Other
> >
> > I'll probably dig a bit on my own, but I'm not sure how much focused time
> > I'll be able to put into it in the absolute immediate term.  I can
> probably
> > whip up a quick demo of the extensions stuff over the next week or so in
> a
> > one-off project to give a bit of a demo and maybe help out with some of
> > experimentation. Feel free to reach out if there's anything in particular
> > that would be helpful to look into.
> >
> > The extensions stuff does have a lot of benefits, but I had less
> components
> > to work with and didn't have the same classpath worries that made real
> > instances (i.e. Docker) more attractive. It was sufficient for our
> > purposes, but there might have to be compromises here. We depend on a lot
> > more of the Hadoop stack.
> >
> > Migrating to JUnit 5 in general *should* be pretty easy. I don't think we
> > really use any of the stuff that migrated in odd ways, so it should
> mostly
> > just be updating annotations and imports (@Before to @BeforeEach, etc.).
> > I'm sure this glosses over at least a few gotchas, though.
> >
> > On Fri, May 3, 2019 at 10:09 AM Michael Miklavcic <
> > michael.miklav...@gmail.com> wrote:
> >
> > > I didn't get a chance to say so earlier, but Justin, I also like the
> > JUnit
> > > 5 extension suggestion. I've gone through some en-masse changes before,
> > > e.g. standardizing the log4j construction idiom, and it honestly wasn't
> > too
> > > bad. Just a thought, it might make sense to kick this off by upgrading
> > > overall JUnit 4 to 5 across the code base, and then diving into some of
> > the
> > > more 5-specific changes you're recommending as needed. I created this
> > Jira
> > > a bit ago - https://issues.apache.org/jira/browse/METRON-2037. That
> was
> > to
> > > upgrade to 4.13, but we might be able to kill 2 birds with one stone if
> > we
> > > go to JUnit 5. I'm volunteering to look into this and/or see the work
> > > through to completion. What do you think?
> > >
> > > > - debuggability (right now we run the tests in the same JVM and
> setting
> > >breakpoints is trivial, even in the innards of Hadoop.  This is very
> > >valuable for figuring out what's going wrong and we'll need SOME
> > > solution
> > >for it)
> > >
> > > Yeah Casey, I remember this from the last time we discussed it. That's
> > the
> > > most import issue to be sure we have a handle on, imo. We'll need to
> > figure
> > > out remote debugging in Docker containers. Not to mention, the
> execution
> > > path becomes a bit more spread out when we're running multiple
> components
> > > as nature intended across multiple processes.
> > >
> > >
> > >
> > > On Fri, May 3, 2019 at 7:14 AM Casey Stella 
> wrote:
> > >
> > > > I just want to chime in and say I'm STRONGLY in favor of a
> docker-based
> > > > approach to testing (I specifically like the JUnit 5 extensions
> > > > suggestion).  I think that forcing a full-dev evaluation for every
> > small
> > > PR
> > > > is a barrier to entry that I'd like to overcome.  I also think that
> > this
> > > is
> > > > going to not be trivial.
> > > >
> > > > There will be weirdness/drama with:
> > > >
> > > >- cleanup
> > > >- set