Since this thread has been active recently and I feel like we're mid-discussion, I just wanted to let folks know that I won't be checking mail thursday/friday (US thanksgiving holiday) - I'll be back next monday.
Thanks! Stephen On Wed, Nov 23, 2016 at 10:03 AM Stephen Sisk <[email protected]> wrote: > It's great to hear more experiences. > > I'm also glad to hear that people see real value in the high > volume/performance benchmark tests. I tried to capture that in the Testing > doc I shared, under "Reasons for Beam Test Strategy". [1] > > It does generally sound like we're in agreement here. Areas of discussion > I see: > 1. People like the idea of bringing up fresh instances for each test > rather than keeping instances running all the time, since that ensures no > contamination between tests. That seems reasonable to me. If we see > flakiness in the tests or we note that setting up/tearing down instances is > taking a lot of time, > 2. Deciding on cluster management software/orchestration software - I want > to make sure we land on the right tool here since choosing the wrong tool > could result in administration of the instances taking more work. I suspect > that's a good place for a follow up discussion, so I'll start a separate > thread on that. I'm happy with whatever tool we choose, but I want to make > sure we take a moment to consider different options and have a reason for > choosing one. > > Etienne - thanks for being willing to port your creation/other scripts > over. You might be a good early tester of whether this system works well > for everyone. > > Stephen > > [1] Reasons for Beam Test Strategy - > https://docs.google.com/document/d/153J9jPQhMCNi_eBzJfhAg-NprQ7vbf1jNVRgdqeEE8I/edit?ts=58349aec# > > > > On Wed, Nov 23, 2016 at 12:48 AM Jean-Baptiste Onofré <[email protected]> > wrote: > > I second Etienne there. > > We worked together on the ElasticsearchIO and definitely, the high > valuable test we did were integration tests with ES on docker and high > volume. > > I think we have to distinguish the two kinds of tests: > 1. utests are located in the IO itself and basically they should cover > the core behaviors of the IO > 2. itests are located as contrib in the IO (they could be part of the IO > but executed by the integration-test plugin or a specific profile) that > deals with "real" backend and high volumes. The resources required by > the itest can be bootstrapped by Jenkins (for instance using > Mesos/Marathon and docker images as already discussed, and it's what I'm > doing on my own "server"). > > It's basically what Stephen described. > > We have to not relay only on itest: utests are very important and they > validate the core behavior. > > My $0.01 ;) > > Regards > JB > > On 11/23/2016 09:27 AM, Etienne Chauchot wrote: > > Hi Stephen, > > > > I like your proposition very much and I also agree that docker + some > > orchestration software would be great ! > > > > On the elasticsearchIO (PR to be created this week) there is docker > > container creation scripts and logstash data ingestion script for IT > > environment available in contrib directory alongside with integration > > tests themselves. I'll be happy to make them compliant to new IT > > environment. > > > > What you say bellow about the need for external IT environment is > > particularly true. As an example with ES what came out in first > > implementation was that there were problems starting at some high volume > > of data (timeouts, ES windowing overflow...) that could not have be seen > > on embedded ES version. Also there where some particularities to > > external instance like secondary (replica) shards that where not visible > > on embedded instance. > > > > Besides, I also favor bringing up instances before test because it > > allows (amongst other things) to be sure to start on a fresh dataset for > > the test to be deterministic. > > > > Etienne > > > > > > Le 23/11/2016 à 02:00, Stephen Sisk a écrit : > >> Hi, > >> > >> I'm excited we're getting lots of discussion going. There are many > >> threads > >> of conversation here, we may choose to split some of them off into a > >> different email thread. I'm also betting I missed some of the > >> questions in > >> this thread, so apologies ahead of time for that. Also apologies for the > >> amount of text, I provided some quick summaries at the top of each > >> section. > >> > >> Amit - thanks for your thoughts. I've responded in detail below. > >> Ismael - thanks for offering to help. There's plenty of work here to go > >> around. I'll try and think about how we can divide up some next steps > >> (probably in a separate thread.) The main next step I see is deciding > >> between kubernetes/mesos+marathon/docker swarm - I'm working on that, > but > >> having lots of different thoughts on what the advantages/disadvantages > of > >> those are would be helpful (I'm not entirely sure of the protocol for > >> collaborating on sub-projects like this.) > >> > >> These issues are all related to what kind of tests we want to write. I > >> think a kubernetes/mesos/swarm cluster could support all the use cases > >> we've discussed here (and thus should not block moving forward with > >> this), > >> but understanding what we want to test will help us understand how the > >> cluster will be used. I'm working on a proposed user guide for testing > IO > >> Transforms, and I'm going to send out a link to that + a short summary > to > >> the list shortly so folks can get a better sense of where I'm coming > >> from. > >> > >> > >> > >> Here's my thinking on the questions we've raised here - > >> > >> Embedded versions of data stores for testing > >> -------------------- > >> Summary: yes! But we still need real data stores to test against. > >> > >> I am a gigantic fan of using embedded versions of the various data > >> stores. > >> I think we should test everything we possibly can using them, and do the > >> majority of our correctness testing using embedded versions + the direct > >> runner. However, it's also important to have at least one test that > >> actually connects to an actual instance, so we can get coverage for > >> things > >> like credentials, real connection strings, etc... > >> > >> The key point is that embedded versions definitely can't cover the > >> performance tests, so we need to host instances if we want to test that. > >> > >> I consider the integration tests/performance benchmarks to be costly > >> things > >> that we do only for the IO transforms with large amounts of community > >> support/usage. A random IO transform used by a few users doesn't > >> necessarily need integration & perf tests, but for heavily used IO > >> transforms, there's a lot of community value in these tests. The > >> maintenance proposal below scales with the amount of community support > >> for > >> a particular IO transform. > >> > >> > >> > >> Reusing data stores ("use the data stores across executions.") > >> ------------------ > >> Summary: I favor a hybrid approach: some frequently used, very small > >> instances that we keep up all the time + larger multi-container data > >> store > >> instances that we spin up for perf tests. > >> > >> I don't think we need to have a strong answer to this question, but I > >> think > >> we do need to know what range of capabilities we need, and use that to > >> inform our requirements on the hosting infrastructure. I think > >> kubernetes/mesos + docker can support all the scenarios I discuss below. > >> > >> I had been thinking of a hybrid approach - reuse some instances and > don't > >> reuse others. Some tests require isolation from other tests (eg. > >> performance benchmarking), while others can easily re-use the same > >> database/data store instance over time, provided they are written in the > >> correct manner (eg. a simple read or write correctness integration > tests) > >> > >> To me, the question of whether to use one instance over time for a > >> test vs > >> spin up an instance for each test comes down to a trade off between > these > >> factors: > >> 1. Flakiness of spin-up of an instance - if it's super flaky, we'll > >> want to > >> keep more instances up and running rather than bring them up/down. (this > >> may also vary by the data store in question) > >> 2. Frequency of testing - if we are running tests every 5 minutes, it > may > >> be wasteful to bring machines up/down every time. If we run tests once a > >> day or week, it seems wasteful to keep the machines up the whole time. > >> 3. Isolation requirements - If tests must be isolated, it means we > either > >> have to bring up the instances for each test, or we have to have some > >> sort > >> of signaling mechanism to indicate that a given instance is in use. I > >> strongly favor bringing up an instance per test. > >> 4. Number/size of containers - if we need a large number of machines > >> for a > >> particular test, keeping them running all the time will use more > >> resources. > >> > >> > >> The major unknown to me is how flaky it'll be to spin these up. I'm > >> hopeful/assuming they'll be pretty stable to bring up, but I think the > >> best > >> way to test that is to start doing it. > >> > >> I suspect the sweet spot is the following: have a set of very small data > >> store instances that stay up to support small-data-size post-commit > >> end to > >> end tests (post-commits run frequently and the data size means the > >> instances would not use many resources), combined with the ability to > >> spin > >> up larger instances for once a day/week performance benchmarks (these > use > >> up more resources and are used less frequently.) That's the mix I'll > >> propose in my docs on testing IO transforms. If spinning up new > >> instances > >> is cheap/non-flaky, I'd be fine with the idea of spinning up instances > >> for > >> each test. > >> > >> > >> > >> Management ("what's the overhead of managing such a deployment") > >> -------------------- > >> Summary: I propose that anyone can contribute scripts for setting up > data > >> store instances + integration/perf tests, but if the community doesn't > >> maintain a particular data store's tests, we disable the tests and > >> turn off > >> the data store instances. > >> > >> Management of these instances is a crucial question. First, let's break > >> down what tasks we'll need to do on a recurring basis: > >> 1. Ongoing maintenance (update to new versions, both instance & > >> dependencies) - we don't want to have a lot of old versions vulnerable > to > >> attacks/buggy > >> 2. Investigate breakages/regressions > >> (I'm betting there will be more things we'll discover - let me know if > >> you > >> have suggestions) > >> > >> There's a couple goals I see: > >> 1. We should only do sys admin work for things that give us a lot of > >> benefit. (ie, don't build IT/perf/data store set up scripts for data > >> stores > >> without a large community) > >> 2. We should do as much as possible of testing via in-memory/embedded > >> testing (as you brought up). > >> 3. Reduce the amount of manual administration overhead > >> > >> As I discussed above, I think that integration tests/performance > >> benchmarks > >> are costly things that we should do only for the IO transforms with > large > >> amounts of community support/usage. Thus, I propose that we limit the IO > >> transforms that get integration tests & performance benchmarks to those > >> that have community support for maintaining the data store instances. > >> > >> We can enforce this organically using some simple rules: > >> 1. Investigating breakages/regressions: if a given integration/perf test > >> starts failing and no one investigates it within a set period of time (a > >> week?), we disable the tests and shut off the data store instances if we > >> have instances running. When someone wants to step up and support it > >> again, > >> they can fix the test, check it in, and re-enable the test. > >> 2. Ongoing maintenance: every N months, file a jira issue that is just > >> "is > >> the IO Transform X data store up to date?" - if the jira is not > >> resolved in > >> a set period of time (1 month?), the perf/integration tests are > disabled, > >> and the data store instances shut off. > >> > >> This is pretty flexible - > >> * If a particular person or organization wants to support an IO > >> transform, > >> they can. If a group of people all organically organize to keep the > tests > >> running, they can. > >> * It can be mostly automated - there's not a lot of central organizing > >> work > >> that needs to be done. > >> > >> Exposing the information about what IO transforms currently have running > >> IT/perf benchmarks on the website will let users know what IO transforms > >> are well supported. > >> > >> I like this solution, but I also recognize this is a tricky problem. > This > >> is something the community needs to be supportive of, so I'm open to > >> other > >> thoughts. > >> > >> > >> Simulating failures in real nodes ("programmatic tests to simulate > >> failure") > >> ----------------- > >> Summary: 1) Focus our testing on the code in Beam 2) We should > >> encourage a > >> design pattern separating out network/retry logic from the main IO > >> transform logic > >> > >> We *could* create instance failure in any container management software > - > >> we can use their programmatic APIs to determine which containers are > >> running the instances, and ask them to kill the container in question. A > >> slow node would be trickier, but I'm sure we could figure it out - for > >> example, add a network proxy that would delay responses. > >> > >> However, I would argue that this type of testing doesn't gain us a > >> lot, and > >> is complicated to set up. I think it will be easier to test network > >> errors > >> and retry behavior in unit tests for the IO transforms. > >> > >> Part of the way to handle this is to separate out the read code from the > >> network code (eg. bigtable has BigtableService). If you put the "handle > >> errors/retry logic" code in a separate MySourceService class, you can > >> test > >> MySourceService on the wide variety of networks errors/data store > >> problems, > >> and then your main IO transform tests focus on the read behavior and > >> handling the small set of errors the MySourceService class will return. > >> > >> I also think we should focus on testing the IO Transform, not the data > >> store - if we kill a node in a data store, it's that data store's > >> problem, > >> not beam's problem. As you were pointing out, there are a *large* > >> number of > >> possible ways that a particular data store can fail, and we would like > to > >> support many different data stores. Rather than try to test that each > >> data > >> store behaves well, we should ensure that we handle generic/expected > >> errors > >> in a graceful manner. > >> > >> > >> > >> > >> > >> > >> Ismaeal had a couple other quick comments/questions, I'll answer here - > >> We can use this to test other runners running on multiple machines - I > >> agree. This is also necessary for a good performance benchmark test. > >> > >> "providing the test machines to mount the cluster" - we can discuss this > >> further, but one possible option is that google may be willing to donate > >> something to support this. > >> > >> "IO Consistency" - let's follow up on those questions in another thread. > >> That's as much about the public interface we provide to users as > anything > >> else. I agree with your sentiment that a user should be able to expect > >> predictable behavior from the different IO transforms. > >> > >> Thanks for everyone's questions/comments - I really am excited to see > >> that > >> people care about this :) > >> > >> Stephen > >> > >> On Tue, Nov 22, 2016 at 7:59 AM Ismaël Mejía <[email protected]> wrote: > >> > >>> Hello, > >>> > >>> @Stephen Thanks for your proposal, it is really interesting, I would > >>> really > >>> like to help with this. I have never played with Kubernetes but this > >>> seems > >>> a really nice chance to do something useful with it. > >>> > >>> We (at Talend) are testing most of the IOs using simple container > images > >>> and in some particular cases ‘clusters’ of containers using > >>> docker-compose > >>> (a little bit like Amit’s (2) proposal). It would be really nice to > have > >>> this at the Beam level, in particular to try to test more complex > >>> semantics, I don’t know how programmable kubernetes is to achieve > >>> this for > >>> example: > >>> > >>> Let’s think we have a cluster of Cassandra or Kafka nodes, I would > >>> like to > >>> have programmatic tests to simulate failure (e.g. kill a node), or > >>> simulate > >>> a really slow node, to ensure that the IO behaves as expected in the > >>> Beam > >>> pipeline for the given runner. > >>> > >>> Another related idea is to improve IO consistency: Today the > >>> different IOs > >>> have small differences in their failure behavior, I really would like > >>> to be > >>> able to predict with more precision what will happen in case of errors, > >>> e.g. what is the correct behavior if I am writing to a Kafka node and > >>> there > >>> is a network partition, does the Kafka sink retries or no ? and what > >>> if it > >>> is the JdbcIO ?, will it work the same e.g. assuming checkpointing? > >>> Or do > >>> we guarantee exactly once writes somehow?, today I am not sure about > >>> what > >>> happens (or if the expected behavior depends on the runner), but well > >>> maybe > >>> it is just that I don’t know and we have tests to ensure this. > >>> > >>> Of course both are really hard problems, but I think with your > >>> proposal we > >>> can try to tackle them, as well as the performance ones. And apart of > >>> the > >>> data stores, I think it will be also really nice to be able to test the > >>> runners in a distributed manner. > >>> > >>> So what is the next step? How do you imagine such integration tests? > >>> ? Who > >>> can provide the test machines so we can mount the cluster? > >>> > >>> Maybe my ideas are a bit too far away for an initial setup, but it > >>> will be > >>> really nice to start working on this. > >>> > >>> Ismael > >>> > >>> > >>> On Tue, Nov 22, 2016 at 11:00 AM, Amit Sela <[email protected]> > >>> wrote: > >>> > >>>> Hi Stephen, > >>>> > >>>> I was wondering about how we plan to use the data stores across > >>> executions. > >>>> Clearly, it's best to setup a new instance (container) for every test, > >>>> running a "standalone" store (say HBase/Cassandra for example), and > >>>> once > >>>> the test is done, teardown the instance. It should also be agnostic to > >>> the > >>>> runtime environment (e.g., Docker on Kubernetes). > >>>> I'm wondering though what's the overhead of managing such a deployment > >>>> which could become heavy and complicated as more IOs are supported and > >>> more > >>>> test cases introduced. > >>>> > >>>> Another way to go would be to have small clusters of different data > >>> stores > >>>> and run against new "namespaces" (while lazily evicting old ones), > >>>> but I > >>>> think this is less likely as maintaining a distributed instance (even > a > >>>> small one) for each data store sounds even more complex. > >>>> > >>>> A third approach would be to to simply have an "embedded" in-memory > >>>> instance of a data store as part of a test that runs against it > >>>> (such as > >>> an > >>>> embedded Kafka, though not a data store). > >>>> This is probably the simplest solution in terms of orchestration, > >>>> but it > >>>> depends on having a proper "embedded" implementation for an IO. > >>>> > >>>> Does this make sense to you ? have you considered it ? > >>>> > >>>> Thanks, > >>>> Amit > >>>> > >>>> On Tue, Nov 22, 2016 at 8:20 AM Jean-Baptiste Onofré <[email protected] > > > >>>> wrote: > >>>> > >>>>> Hi Stephen, > >>>>> > >>>>> as already discussed a bit together, it sounds great ! Especially I > >>> like > >>>>> it as a both integration test platform and good coverage for IOs. > >>>>> > >>>>> I'm very late on this but, as said, I will share with you my Marathon > >>>>> JSON and Mesos docker images. > >>>>> > >>>>> By the way, I started to experiment a bit kubernetes and swamp but > >>>>> it's > >>>>> not yet complete. I will share what I have on the same github repo. > >>>>> > >>>>> Thanks ! > >>>>> Regards > >>>>> JB > >>>>> > >>>>> On 11/16/2016 11:36 PM, Stephen Sisk wrote: > >>>>>> Hi everyone! > >>>>>> > >>>>>> Currently we have a good set of unit tests for our IO Transforms - > >>>> those > >>>>>> tend to run against in-memory versions of the data stores. However, > >>>> we'd > >>>>>> like to further increase our test coverage to include running them > >>>>> against > >>>>>> real instances of the data stores that the IO Transforms work > against > >>>>> (e.g. > >>>>>> cassandra, mongodb, kafka, etc…), which means we'll need to have > real > >>>>>> instances of various data stores. > >>>>>> > >>>>>> Additionally, if we want to do performance regression detection, > it's > >>>>>> important to have instances of the services that behave > >>> realistically, > >>>>>> which isn't true of in-memory or dev versions of the services. > >>>>>> > >>>>>> > >>>>>> Proposed solution > >>>>>> ------------------------- > >>>>>> If we accept this proposal, we would create an infrastructure for > >>>> running > >>>>>> real instances of data stores inside of containers, using container > >>>>>> management software like mesos/marathon, kubernetes, docker swarm, > >>> etc… > >>>>> to > >>>>>> manage the instances. > >>>>>> > >>>>>> This would enable us to build integration tests that run against > >>> those > >>>>> real > >>>>>> instances and performance tests that run against those real > instances > >>>>> (like > >>>>>> those that Jason Kuster is proposing elsewhere.) > >>>>>> > >>>>>> > >>>>>> Why do we need one centralized set of instances vs just having > >>> various > >>>>>> people host their own instances? > >>>>>> ------------------------- > >>>>>> Reducing flakiness of tests is key. By not having dependencies from > >>> the > >>>>>> core project on external services/instances of data stores we have > >>>>>> guaranteed access to the services and the group can fix issues that > >>>>> arise. > >>>>>> An exception would be something that has an ops team supporting it > >>> (eg, > >>>>>> AWS, Google Cloud or other professionally managed service) - those > we > >>>>> trust > >>>>>> will be stable. > >>>>>> > >>>>>> > >>>>>> There may be a lot of different data stores needed - how will we > >>>> maintain > >>>>>> them? > >>>>>> ------------------------- > >>>>>> It will take work above and beyond that of a normal set of unit > tests > >>>> to > >>>>>> build and maintain integration/performance tests & their data store > >>>>>> instances. > >>>>>> > >>>>>> Setup & maintenance of the data store containers and data store > >>>> instances > >>>>>> on it must be automated. It also has to be as simple of a setup as > >>>>>> possible, and we should avoid hand tweaking the containers - > >>> expecting > >>>>>> checked in scripts/dockerfiles is key. > >>>>>> > >>>>>> Aligned with the community ownership approach of Apache, as members > >>> of > >>>>> the > >>>>>> community are excited to contribute & maintain those tests and the > >>>>>> integration/performance tests, people will be able to step up and do > >>>>> that. > >>>>>> If there is no longer support for maintaining a particular set of > >>>>>> integration & performance tests and their data store instances, then > >>> we > >>>>> can > >>>>>> disable those tests. We may document on the website what IO > >>> Transforms > >>>>> have > >>>>>> current integration/performance tests so users know what level of > >>>> testing > >>>>>> the various IO Transforms have. > >>>>>> > >>>>>> > >>>>>> What about requirements for the container management software > itself? > >>>>>> ------------------------- > >>>>>> * We should have the data store instances themselves in Docker. > >>> Docker > >>>>>> allows new instances to be spun up in a quick, reproducible way and > >>> is > >>>>>> fairly platform independent. It has wide support from a variety of > >>>>>> different container management services. > >>>>>> * As little admin work required as possible. Crashing instances > >>> should > >>>> be > >>>>>> restarted, setup should be simple, everything possible should be > >>>>>> scripted/scriptable. > >>>>>> * Logs and test output should be on a publicly available website, > >>>> without > >>>>>> needing to log into test execution machine. Centralized capture of > >>>>>> monitoring info/logs from instances running in the containers would > >>>>> support > >>>>>> this. Ideally, this would just be supported by the container > software > >>>> out > >>>>>> of the box. > >>>>>> * It'd be useful to have good persistent volume in the container > >>>>> management > >>>>>> software so that databases don't have to reload large data sets > every > >>>>> time. > >>>>>> * The containers may be a place to execute runners themselves if we > >>>> need > >>>>>> larger runner instances, so it should play well with Spark, Flink, > >>> etc… > >>>>>> As I discussed earlier on the mailing list, it looks like hosting > >>>> docker > >>>>>> containers on kubernetes, docker swarm or mesos+marathon would be a > >>>> good > >>>>>> solution. > >>>>>> > >>>>>> Thanks, > >>>>>> Stephen Sisk > >>>>>> > >>>>> -- > >>>>> Jean-Baptiste Onofré > >>>>> [email protected] > >>>>> http://blog.nanthrax.net > >>>>> Talend - http://www.talend.com > >>>>> > > > > -- > Jean-Baptiste Onofré > [email protected] > http://blog.nanthrax.net > Talend - http://www.talend.com > >
