Re: Some Thoughts on IO Integration Tests

Stephen Sisk Thu, 12 Jan 2017 17:57:20 -0800

I see the need for/like KinesisIOTestPipelineOptions - it would allow all
ITs/perf tests that need a kinesis connection to get one.


What problem are we trying to solve with KinesisIOTest1PipelineOptions ?

This presumes that there would also be KinesisIOTest2PipelineOptions ? What
would be different between the two? Maybe if we had a scenario and used
"real" names (ie, what we'd actually call the variables)

S

On Thu, Jan 12, 2017 at 5:47 PM Jason Kuster <[email protected]>
wrote:

> Stephen, I agree with you about option 2 -- I think it strikes the right
> balance. To your question at the end, Luke essentially answered it, but to
> elaborate: we could make e.g. KinesisIOTestPipelineOptions which extends
> TestPipelineOptions and then KinesisIOTest1PipelineOptions which extends
> KinesisIOTestPipelineOptions (whew, that's a workout for the fingers). It
> requires a change to how we parse PipelineOptions inside of
> TestPipeline.java which means we lose a bit of user-friendliness, but
> enables this scenario which I think is worth the tradeoff.
>
> On Tue, Jan 10, 2017 at 4:23 PM, Lukasz Cwik <[email protected]>
> wrote:
>
> > PipelineOptions allows you to aggregate common options within a parent
> > interface with child interfaces extending and inheriting those options.
> >
> > For example you can have MyIOOptionsTest1 which extends MyIOOptions and
> > also MyIOOptionsTest2 which extends MyIOOptions. MyIOOptions can have all
> > the shared fields such as port or host or whatever and then
> > MyIOOptionsTest1 can have specific parameters for Test1 and
> > MyIOOptionsTest2 can have specific parameters for Test2.
> >
> >
> > On Tue, Jan 10, 2017 at 4:11 PM, Stephen Sisk <[email protected]>
> > wrote:
> >
> > > thanks Jason for doing this research! There are a lot of options.
> > >
> > > You mentioned "Steven" - I assume you're talking about me :)
> > >
> > > As you mention in your goals, we want this to work for both individual
> > devs
> > > working on a particular store, as well as the beam CI system. I'm
> > inclined
> > > to favor making this pretty easy for individual devs, as long as it
> > doesn't
> > > come at the cost of making CI config painful. Given that, I suspect
> that
> > > we'd want something that's straightforward to discover how to
> configure,
> > > that there's an obvious error message if it's misconfigured, and is
> easy
> > to
> > > learn.
> > >
> > > In that world, I would learn towards options 2 & 3. I think a file that
> > has
> > > to be maintained/configured separately outside of the repository isn't
> > the
> > > most developer friendly option. That may be somewhat of a workflow
> thing.
> > > And if it was super-important to keep the passwords secret, then a file
> > > outside of the repository might be the only option that would work -
> > > however, I think we can simplify the problem and secure our test
> servers
> > by
> > > keeping them on a private network. (I don't think we are an important
> > > target, so I'm not too worried about defense in depth)
> > >
> > > Between options 2 & 3 I would favor option 2 - I think having defaults
> > > pre-set to work in the beam CI environment will lead to a lot of
> > potential
> > > confusion on the part of developers, since it would let them run the
> > test,
> > > and then only give them errors about connecting to a server. I think
> it'd
> > > be better to get errors about not having pipeline options set, or
> errors
> > > about obviously wrong/default pipeline option values, rather than
> > "correct,
> > > but not for your environment" settings. For example, if we have valid
> > > defaults, and a user only sets some of the values (ie, sets IP, but not
> > > port), it'd be trickier to detect when the value being used is valid.
> I'd
> > > want the test to be able to show users an error that the option wasn't
> > set,
> > > rather than have it try to connect and timeout b/c the port isn't
> > correct.
> > >
> > > I also don't think that having per-IT options is a good idea. If 2
> > > different ITs can share a data store, we should be able to pass them
> the
> > > same config - that also makes me strongly favor an option where we
> have a
> > > shared set of options. Do we *have* to make it part of the
> > > TestPipelineOptions? Can we make something like a
> > BeamIOTestPipelineOption?
> > >
> > > This generally leads me to favor option 2.
> > >
> > > S
> > >
> > > On Tue, Jan 10, 2017 at 3:39 PM Jason Kuster <[email protected].
> > > invalid>
> > > wrote:
> > >
> > > > Hi all,
> > > >
> > > > Following up on some of the discussions already on-list, I wanted to
> > > > solicit some more feedback about some implementation details
> regarding
> > > the
> > > > IO Integration Tests.
> > > >
> > > > As it currently stands, we mostly have IO ITs for GCP-based IO, which
> > our
> > > > GCP-based Jenkins executors handle natively, but as our integration
> > test
> > > > coverage begins to expand, we're going to run into several of the
> > > problems
> > > > relevant to what Steven is doing with hosting data stores for use by
> > > ITs. I
> > > > wanted to get people's feedback about how to handle passing
> credentials
> > > to
> > > > the ITs. We have a couple options, motivated by some goals.
> > > >
> > > > Goals:
> > > >
> > > > * Has to work in Apache Beam CI environment.
> > > > * Has to run on dev machines (w/o access to beam CI environment).
> > > > * One way of passing datastore config only.
> > > > * An individual IT fails fast if run and it doesn't have valid
> config.
> > > > * IO performance tests will have a validation component (this implies
> > we
> > > > need to run the IO ITs, not just the IO IT pipelines).
> > > > * Devs working on an individual IO transform can run Integration &
> perf
> > > > tests without recreating the data stores every time
> > > > * Devs working on a runner's IO can run all the IO integration & perf
> > > > tests. They may have to recreate the data stores every time (or
> > possibly
> > > > have a manual config that helps with this.) It's okay if this world
> is
> > a
> > > > bit awkward, it just needs to be possible.
> > > >
> > > >
> > > > Option 1: IO Configuration File
> > > >
> > > > The first option is to read all credentials from some file stored on
> > > disk.
> > > > We can define a location for an (xml, json, yaml, etc) file which we
> > can
> > > > read in each IT to find the credentials that IT needs. This method
> has
> > a
> > > > couple of upsides and a couple of downsides.
> > > >
> > > > * Upsides
> > > >     * Passing credentials to ITs, and adding new credentials, is
> > > relatively
> > > > easy.
> > > >     * Individual users can spin up their own data stores, put the
> > > > credentials in the file, run their ITs and have things just work.
> > > > * Downsides
> > > >     * Relying on a file, especially a file not checked in to the
> > > repository
> > > > (to prevent people from accidentally sharing credentials to their
> data
> > > > store, etc) is fragile and can lead to some confusing failure cases.
> > > >     * ITs are supposed to be self-contained; using a file on disk
> makes
> > > > things like running them in CI harder.
> > > >     * It seems like datastore location, username, and password are
> > things
> > > > that are a better fit for the IT PipelineOptions anyway.
> > > >
> > > >
> > > > Option 2: TestPipelineOptions
> > > >
> > > > Another option is to specify them as general PipelineOptions on
> > > > TestPipelineOptions and then to build the specific IT's options from
> > > there.
> > > > For example, say we have MyStoreIT1, MyStoreIT2 and MyStoreIT3. We
> > could
> > > > specify inside of TestPipelineOptions some options like "MyStoreIP",
> > > > "MyStoreUsername", and "MyStorePassword", and then the command for
> > > invoking
> > > > them would look like (omitting some irrelevant things):
> > > >
> > > > mvn clean verify -DskipITs=false -DbeamTestPipelineOptions='[...,
> > > > "--MyStoreIP=1.2.3.4", "--MyStoreUsername=beamuser",
> > > > "--MyStorePassword=hunter2"]'.
> > > > * Upsides
> > > >     * Test is self-contained -- no dependency on an external file and
> > all
> > > > relevant things can be specified on the command line; easier for
> users
> > > and
> > > > CI.
> > > >     * Passing credentials to ITs via pipelineoptions feels better.
> > > > * Downsides
> > > >     * Harder to pass different credentials to one specific IT; e.g. I
> > > want
> > > > MyStoreIT1 and 2 to run against 1.2.3.4, but MyStoreIT3 to run
> against
> > > > 9.8.7.6.
> > > >     * Investing in this pattern means a proliferation of
> > > > TestPipelineOptions. Potentially bad, especially for a CI suite
> > running a
> > > > large number of ITs -- size of command line args may become
> > unmanageable
> > > > with 5+ data stores.
> > > >
> > > >
> > > > Option 3: Individual IT Options
> > > >
> > > > The last option I can think of is to specify the options directly on
> > the
> > > > IT's options, e.g. MyStoreIT1Options, and set defaults which work
> well
> > > for
> > > > CI. This means that CI could run an entire suite of ITs without
> > > specifying
> > > > any arguments and trusting that the ITs' defaults will work, but
> means
> > an
> > > > individual developer is potentially able to run only one IT at a
> time,
> > > > since it will be impossible to override all options from the command
> > > line.
> > > > * Upsides
> > > >     * Test is still self-contained, and even more so -- possible to
> > > specify
> > > > args targeted at one IT in particular.
> > > >     * Args are specified right where they're used; way smaller chance
> > of
> > > > confusion or mistakes.
> > > >     * Easiest for CI -- as long as defaults for data store auth and
> > > > location are correct from the perspective of the Jenkins executor, it
> > can
> > > > essentially just turn all ITs on and run them as is.
> > > > * Downsides
> > > >     * Hardest for individual developers to run an entire suite of ITs
> > --
> > > > since defaults are configured for running in CI environment, they
> will
> > > > likely fail when running on the user's machine, resulting in
> annoyance
> > > for
> > > > the user.
> > > >
> > > >
> > > > If anyone has thoughts on these, please let me know.
> > > >
> > > > Best,
> > > >
> > > > Jason
> > > >
> > > > --
> > > > -------
> > > > Jason Kuster
> > > > Apache Beam (Incubating) / Google Cloud Dataflow
> > > >
> > >
> >
>
>
>
> --
> -------
> Jason Kuster
> Apache Beam (Incubating) / Google Cloud Dataflow
>

Re: Some Thoughts on IO Integration Tests

Reply via email to