I see the need for/like KinesisIOTestPipelineOptions - it would allow all ITs/perf tests that need a kinesis connection to get one.
What problem are we trying to solve with KinesisIOTest1PipelineOptions ? This presumes that there would also be KinesisIOTest2PipelineOptions ? What would be different between the two? Maybe if we had a scenario and used "real" names (ie, what we'd actually call the variables) S On Thu, Jan 12, 2017 at 5:47 PM Jason Kuster <[email protected]> wrote: > Stephen, I agree with you about option 2 -- I think it strikes the right > balance. To your question at the end, Luke essentially answered it, but to > elaborate: we could make e.g. KinesisIOTestPipelineOptions which extends > TestPipelineOptions and then KinesisIOTest1PipelineOptions which extends > KinesisIOTestPipelineOptions (whew, that's a workout for the fingers). It > requires a change to how we parse PipelineOptions inside of > TestPipeline.java which means we lose a bit of user-friendliness, but > enables this scenario which I think is worth the tradeoff. > > On Tue, Jan 10, 2017 at 4:23 PM, Lukasz Cwik <[email protected]> > wrote: > > > PipelineOptions allows you to aggregate common options within a parent > > interface with child interfaces extending and inheriting those options. > > > > For example you can have MyIOOptionsTest1 which extends MyIOOptions and > > also MyIOOptionsTest2 which extends MyIOOptions. MyIOOptions can have all > > the shared fields such as port or host or whatever and then > > MyIOOptionsTest1 can have specific parameters for Test1 and > > MyIOOptionsTest2 can have specific parameters for Test2. > > > > > > On Tue, Jan 10, 2017 at 4:11 PM, Stephen Sisk <[email protected]> > > wrote: > > > > > thanks Jason for doing this research! There are a lot of options. > > > > > > You mentioned "Steven" - I assume you're talking about me :) > > > > > > As you mention in your goals, we want this to work for both individual > > devs > > > working on a particular store, as well as the beam CI system. I'm > > inclined > > > to favor making this pretty easy for individual devs, as long as it > > doesn't > > > come at the cost of making CI config painful. Given that, I suspect > that > > > we'd want something that's straightforward to discover how to > configure, > > > that there's an obvious error message if it's misconfigured, and is > easy > > to > > > learn. > > > > > > In that world, I would learn towards options 2 & 3. I think a file that > > has > > > to be maintained/configured separately outside of the repository isn't > > the > > > most developer friendly option. That may be somewhat of a workflow > thing. > > > And if it was super-important to keep the passwords secret, then a file > > > outside of the repository might be the only option that would work - > > > however, I think we can simplify the problem and secure our test > servers > > by > > > keeping them on a private network. (I don't think we are an important > > > target, so I'm not too worried about defense in depth) > > > > > > Between options 2 & 3 I would favor option 2 - I think having defaults > > > pre-set to work in the beam CI environment will lead to a lot of > > potential > > > confusion on the part of developers, since it would let them run the > > test, > > > and then only give them errors about connecting to a server. I think > it'd > > > be better to get errors about not having pipeline options set, or > errors > > > about obviously wrong/default pipeline option values, rather than > > "correct, > > > but not for your environment" settings. For example, if we have valid > > > defaults, and a user only sets some of the values (ie, sets IP, but not > > > port), it'd be trickier to detect when the value being used is valid. > I'd > > > want the test to be able to show users an error that the option wasn't > > set, > > > rather than have it try to connect and timeout b/c the port isn't > > correct. > > > > > > I also don't think that having per-IT options is a good idea. If 2 > > > different ITs can share a data store, we should be able to pass them > the > > > same config - that also makes me strongly favor an option where we > have a > > > shared set of options. Do we *have* to make it part of the > > > TestPipelineOptions? Can we make something like a > > BeamIOTestPipelineOption? > > > > > > This generally leads me to favor option 2. > > > > > > S > > > > > > On Tue, Jan 10, 2017 at 3:39 PM Jason Kuster <[email protected]. > > > invalid> > > > wrote: > > > > > > > Hi all, > > > > > > > > Following up on some of the discussions already on-list, I wanted to > > > > solicit some more feedback about some implementation details > regarding > > > the > > > > IO Integration Tests. > > > > > > > > As it currently stands, we mostly have IO ITs for GCP-based IO, which > > our > > > > GCP-based Jenkins executors handle natively, but as our integration > > test > > > > coverage begins to expand, we're going to run into several of the > > > problems > > > > relevant to what Steven is doing with hosting data stores for use by > > > ITs. I > > > > wanted to get people's feedback about how to handle passing > credentials > > > to > > > > the ITs. We have a couple options, motivated by some goals. > > > > > > > > Goals: > > > > > > > > * Has to work in Apache Beam CI environment. > > > > * Has to run on dev machines (w/o access to beam CI environment). > > > > * One way of passing datastore config only. > > > > * An individual IT fails fast if run and it doesn't have valid > config. > > > > * IO performance tests will have a validation component (this implies > > we > > > > need to run the IO ITs, not just the IO IT pipelines). > > > > * Devs working on an individual IO transform can run Integration & > perf > > > > tests without recreating the data stores every time > > > > * Devs working on a runner's IO can run all the IO integration & perf > > > > tests. They may have to recreate the data stores every time (or > > possibly > > > > have a manual config that helps with this.) It's okay if this world > is > > a > > > > bit awkward, it just needs to be possible. > > > > > > > > > > > > Option 1: IO Configuration File > > > > > > > > The first option is to read all credentials from some file stored on > > > disk. > > > > We can define a location for an (xml, json, yaml, etc) file which we > > can > > > > read in each IT to find the credentials that IT needs. This method > has > > a > > > > couple of upsides and a couple of downsides. > > > > > > > > * Upsides > > > > * Passing credentials to ITs, and adding new credentials, is > > > relatively > > > > easy. > > > > * Individual users can spin up their own data stores, put the > > > > credentials in the file, run their ITs and have things just work. > > > > * Downsides > > > > * Relying on a file, especially a file not checked in to the > > > repository > > > > (to prevent people from accidentally sharing credentials to their > data > > > > store, etc) is fragile and can lead to some confusing failure cases. > > > > * ITs are supposed to be self-contained; using a file on disk > makes > > > > things like running them in CI harder. > > > > * It seems like datastore location, username, and password are > > things > > > > that are a better fit for the IT PipelineOptions anyway. > > > > > > > > > > > > Option 2: TestPipelineOptions > > > > > > > > Another option is to specify them as general PipelineOptions on > > > > TestPipelineOptions and then to build the specific IT's options from > > > there. > > > > For example, say we have MyStoreIT1, MyStoreIT2 and MyStoreIT3. We > > could > > > > specify inside of TestPipelineOptions some options like "MyStoreIP", > > > > "MyStoreUsername", and "MyStorePassword", and then the command for > > > invoking > > > > them would look like (omitting some irrelevant things): > > > > > > > > mvn clean verify -DskipITs=false -DbeamTestPipelineOptions='[..., > > > > "--MyStoreIP=1.2.3.4", "--MyStoreUsername=beamuser", > > > > "--MyStorePassword=hunter2"]'. > > > > * Upsides > > > > * Test is self-contained -- no dependency on an external file and > > all > > > > relevant things can be specified on the command line; easier for > users > > > and > > > > CI. > > > > * Passing credentials to ITs via pipelineoptions feels better. > > > > * Downsides > > > > * Harder to pass different credentials to one specific IT; e.g. I > > > want > > > > MyStoreIT1 and 2 to run against 1.2.3.4, but MyStoreIT3 to run > against > > > > 9.8.7.6. > > > > * Investing in this pattern means a proliferation of > > > > TestPipelineOptions. Potentially bad, especially for a CI suite > > running a > > > > large number of ITs -- size of command line args may become > > unmanageable > > > > with 5+ data stores. > > > > > > > > > > > > Option 3: Individual IT Options > > > > > > > > The last option I can think of is to specify the options directly on > > the > > > > IT's options, e.g. MyStoreIT1Options, and set defaults which work > well > > > for > > > > CI. This means that CI could run an entire suite of ITs without > > > specifying > > > > any arguments and trusting that the ITs' defaults will work, but > means > > an > > > > individual developer is potentially able to run only one IT at a > time, > > > > since it will be impossible to override all options from the command > > > line. > > > > * Upsides > > > > * Test is still self-contained, and even more so -- possible to > > > specify > > > > args targeted at one IT in particular. > > > > * Args are specified right where they're used; way smaller chance > > of > > > > confusion or mistakes. > > > > * Easiest for CI -- as long as defaults for data store auth and > > > > location are correct from the perspective of the Jenkins executor, it > > can > > > > essentially just turn all ITs on and run them as is. > > > > * Downsides > > > > * Hardest for individual developers to run an entire suite of ITs > > -- > > > > since defaults are configured for running in CI environment, they > will > > > > likely fail when running on the user's machine, resulting in > annoyance > > > for > > > > the user. > > > > > > > > > > > > If anyone has thoughts on these, please let me know. > > > > > > > > Best, > > > > > > > > Jason > > > > > > > > -- > > > > ------- > > > > Jason Kuster > > > > Apache Beam (Incubating) / Google Cloud Dataflow > > > > > > > > > > > > > -- > ------- > Jason Kuster > Apache Beam (Incubating) / Google Cloud Dataflow >
