Since this thread has been active recently and I feel like we're
mid-discussion, I just wanted to let folks know that I won't be checking
mail thursday/friday (US thanksgiving holiday) - I'll be back next monday.

Thanks!
Stephen

On Wed, Nov 23, 2016 at 10:03 AM Stephen Sisk <[email protected]> wrote:

> It's great to hear more experiences.
>
> I'm also glad to hear that people see real value in the high
> volume/performance benchmark tests. I tried to capture that in the Testing
> doc I shared, under "Reasons for Beam Test Strategy". [1]
>
> It does generally sound like we're in agreement here. Areas of discussion
> I see:
> 1.  People like the idea of bringing up fresh instances for each test
> rather than keeping instances running all the time, since that ensures no
> contamination between tests. That seems reasonable to me. If we see
> flakiness in the tests or we note that setting up/tearing down instances is
> taking a lot of time,
> 2. Deciding on cluster management software/orchestration software - I want
> to make sure we land on the right tool here since choosing the wrong tool
> could result in administration of the instances taking more work. I suspect
> that's a good place for a follow up discussion, so I'll start a separate
> thread on that. I'm happy with whatever tool we choose, but I want to make
> sure we take a moment to consider different options and have a reason for
> choosing one.
>
> Etienne - thanks for being willing to port your creation/other scripts
> over. You might be a good early tester of whether this system works well
> for everyone.
>
> Stephen
>
> [1]  Reasons for Beam Test Strategy -
> https://docs.google.com/document/d/153J9jPQhMCNi_eBzJfhAg-NprQ7vbf1jNVRgdqeEE8I/edit?ts=58349aec#
>
>
>
> On Wed, Nov 23, 2016 at 12:48 AM Jean-Baptiste Onofré <[email protected]>
> wrote:
>
> I second Etienne there.
>
> We worked together on the ElasticsearchIO and definitely, the high
> valuable test we did were integration tests with ES on docker and high
> volume.
>
> I think we have to distinguish the two kinds of tests:
> 1. utests are located in the IO itself and basically they should cover
> the core behaviors of the IO
> 2. itests are located as contrib in the IO (they could be part of the IO
> but executed by the integration-test plugin or a specific profile) that
> deals with "real" backend and high volumes. The resources required by
> the itest can be bootstrapped by Jenkins (for instance using
> Mesos/Marathon and docker images as already discussed, and it's what I'm
> doing on my own "server").
>
> It's basically what Stephen described.
>
> We have to not relay only on itest: utests are very important and they
> validate the core behavior.
>
> My $0.01 ;)
>
> Regards
> JB
>
> On 11/23/2016 09:27 AM, Etienne Chauchot wrote:
> > Hi Stephen,
> >
> > I like your proposition very much and I also agree that docker + some
> > orchestration software would be great !
> >
> > On the elasticsearchIO (PR to be created this week) there is docker
> > container creation scripts and logstash data ingestion script for IT
> > environment available in contrib directory alongside with integration
> > tests themselves. I'll be happy to make them compliant to new IT
> > environment.
> >
> > What you say bellow about the need for external IT environment is
> > particularly true. As an example with ES what came out in first
> > implementation was that there were problems starting at some high volume
> > of data (timeouts, ES windowing overflow...) that could not have be seen
> > on embedded ES version. Also there where some particularities to
> > external instance like secondary (replica) shards that where not visible
> > on embedded instance.
> >
> > Besides, I also favor bringing up instances before test because it
> > allows (amongst other things) to be sure to start on a fresh dataset for
> > the test to be deterministic.
> >
> > Etienne
> >
> >
> > Le 23/11/2016 à 02:00, Stephen Sisk a écrit :
> >> Hi,
> >>
> >> I'm excited we're getting lots of discussion going. There are many
> >> threads
> >> of conversation here, we may choose to split some of them off into a
> >> different email thread. I'm also betting I missed some of the
> >> questions in
> >> this thread, so apologies ahead of time for that. Also apologies for the
> >> amount of text, I provided some quick summaries at the top of each
> >> section.
> >>
> >> Amit - thanks for your thoughts. I've responded in detail below.
> >> Ismael - thanks for offering to help. There's plenty of work here to go
> >> around. I'll try and think about how we can divide up some next steps
> >> (probably in a separate thread.) The main next step I see is deciding
> >> between kubernetes/mesos+marathon/docker swarm - I'm working on that,
> but
> >> having lots of different thoughts on what the advantages/disadvantages
> of
> >> those are would be helpful (I'm not entirely sure of the protocol for
> >> collaborating on sub-projects like this.)
> >>
> >> These issues are all related to what kind of tests we want to write. I
> >> think a kubernetes/mesos/swarm cluster could support all the use cases
> >> we've discussed here (and thus should not block moving forward with
> >> this),
> >> but understanding what we want to test will help us understand how the
> >> cluster will be used. I'm working on a proposed user guide for testing
> IO
> >> Transforms, and I'm going to send out a link to that + a short summary
> to
> >> the list shortly so folks can get a better sense of where I'm coming
> >> from.
> >>
> >>
> >>
> >> Here's my thinking on the questions we've raised here -
> >>
> >> Embedded versions of data stores for testing
> >> --------------------
> >> Summary: yes! But we still need real data stores to test against.
> >>
> >> I am a gigantic fan of using embedded versions of the various data
> >> stores.
> >> I think we should test everything we possibly can using them, and do the
> >> majority of our correctness testing using embedded versions + the direct
> >> runner. However, it's also important to have at least one test that
> >> actually connects to an actual instance, so we can get coverage for
> >> things
> >> like credentials, real connection strings, etc...
> >>
> >> The key point is that embedded versions definitely can't cover the
> >> performance tests, so we need to host instances if we want to test that.
> >>
> >> I consider the integration tests/performance benchmarks to be costly
> >> things
> >> that we do only for the IO transforms with large amounts of community
> >> support/usage. A random IO transform used by a few users doesn't
> >> necessarily need integration & perf tests, but for heavily used IO
> >> transforms, there's a lot of community value in these tests. The
> >> maintenance proposal below scales with the amount of community support
> >> for
> >> a particular IO transform.
> >>
> >>
> >>
> >> Reusing data stores ("use the data stores across executions.")
> >> ------------------
> >> Summary: I favor a hybrid approach: some frequently used, very small
> >> instances that we keep up all the time + larger multi-container data
> >> store
> >> instances that we spin up for perf tests.
> >>
> >> I don't think we need to have a strong answer to this question, but I
> >> think
> >> we do need to know what range of capabilities we need, and use that to
> >> inform our requirements on the hosting infrastructure. I think
> >> kubernetes/mesos + docker can support all the scenarios I discuss below.
> >>
> >> I had been thinking of a hybrid approach - reuse some instances and
> don't
> >> reuse others. Some tests require isolation from other tests (eg.
> >> performance benchmarking), while others can easily re-use the same
> >> database/data store instance over time, provided they are written in the
> >> correct manner (eg. a simple read or write correctness integration
> tests)
> >>
> >> To me, the question of whether to use one instance over time for a
> >> test vs
> >> spin up an instance for each test comes down to a trade off between
> these
> >> factors:
> >> 1. Flakiness of spin-up of an instance - if it's super flaky, we'll
> >> want to
> >> keep more instances up and running rather than bring them up/down. (this
> >> may also vary by the data store in question)
> >> 2. Frequency of testing - if we are running tests every 5 minutes, it
> may
> >> be wasteful to bring machines up/down every time. If we run tests once a
> >> day or week, it seems wasteful to keep the machines up the whole time.
> >> 3. Isolation requirements - If tests must be isolated, it means we
> either
> >> have to bring up the instances for each test, or we have to have some
> >> sort
> >> of signaling mechanism to indicate that a given instance is in use. I
> >> strongly favor bringing up an instance per test.
> >> 4. Number/size of containers - if we need a large number of machines
> >> for a
> >> particular test, keeping them running all the time will use more
> >> resources.
> >>
> >>
> >> The major unknown to me is how flaky it'll be to spin these up. I'm
> >> hopeful/assuming they'll be pretty stable to bring up, but I think the
> >> best
> >> way to test that is to start doing it.
> >>
> >> I suspect the sweet spot is the following: have a set of very small data
> >> store instances that stay up to support small-data-size post-commit
> >> end to
> >> end tests (post-commits run frequently and the data size means the
> >> instances would not use many resources), combined with the ability to
> >> spin
> >> up larger instances for once a day/week performance benchmarks (these
> use
> >> up more resources and are used less frequently.) That's the mix I'll
> >> propose in my docs on testing IO transforms.  If spinning up new
> >> instances
> >> is cheap/non-flaky, I'd be fine with the idea of spinning up instances
> >> for
> >> each test.
> >>
> >>
> >>
> >> Management ("what's the overhead of managing such a deployment")
> >> --------------------
> >> Summary: I propose that anyone can contribute scripts for setting up
> data
> >> store instances + integration/perf tests, but if the community doesn't
> >> maintain a particular data store's tests, we disable the tests and
> >> turn off
> >> the data store instances.
> >>
> >> Management of these instances is a crucial question. First, let's break
> >> down what tasks we'll need to do on a recurring basis:
> >> 1. Ongoing maintenance (update to new versions, both instance &
> >> dependencies) - we don't want to have a lot of old versions vulnerable
> to
> >> attacks/buggy
> >> 2. Investigate breakages/regressions
> >> (I'm betting there will be more things we'll discover - let me know if
> >> you
> >> have suggestions)
> >>
> >> There's a couple goals I see:
> >> 1. We should only do sys admin work for things that give us a lot of
> >> benefit. (ie, don't build IT/perf/data store set up scripts for data
> >> stores
> >> without a large community)
> >> 2. We should do as much as possible of testing via in-memory/embedded
> >> testing (as you brought up).
> >> 3. Reduce the amount of manual administration overhead
> >>
> >> As I discussed above, I think that integration tests/performance
> >> benchmarks
> >> are costly things that we should do only for the IO transforms with
> large
> >> amounts of community support/usage. Thus, I propose that we limit the IO
> >> transforms that get integration tests & performance benchmarks to those
> >> that have community support for maintaining the data store instances.
> >>
> >> We can enforce this organically using some simple rules:
> >> 1. Investigating breakages/regressions: if a given integration/perf test
> >> starts failing and no one investigates it within a set period of time (a
> >> week?), we disable the tests and shut off the data store instances if we
> >> have instances running. When someone wants to step up and support it
> >> again,
> >> they can fix the test, check it in, and re-enable the test.
> >> 2. Ongoing maintenance: every N months, file a jira issue that is just
> >> "is
> >> the IO Transform X data store up to date?" - if the jira is not
> >> resolved in
> >> a set period of time (1 month?), the perf/integration tests are
> disabled,
> >> and the data store instances shut off.
> >>
> >> This is pretty flexible -
> >> * If a particular person or organization wants to support an IO
> >> transform,
> >> they can. If a group of people all organically organize to keep the
> tests
> >> running, they can.
> >> * It can be mostly automated - there's not a lot of central organizing
> >> work
> >> that needs to be done.
> >>
> >> Exposing the information about what IO transforms currently have running
> >> IT/perf benchmarks on the website will let users know what IO transforms
> >> are well supported.
> >>
> >> I like this solution, but I also recognize this is a tricky problem.
> This
> >> is something the community needs to be supportive of, so I'm open to
> >> other
> >> thoughts.
> >>
> >>
> >> Simulating failures in real nodes ("programmatic tests to simulate
> >> failure")
> >> -----------------
> >> Summary: 1) Focus our testing on the code in Beam 2) We should
> >> encourage a
> >> design pattern separating out network/retry logic from the main IO
> >> transform logic
> >>
> >> We *could* create instance failure in any container management software
> -
> >> we can use their programmatic APIs to determine which containers are
> >> running the instances, and ask them to kill the container in question. A
> >> slow node would be trickier, but I'm sure we could figure it out - for
> >> example, add a network proxy that would delay responses.
> >>
> >> However, I would argue that this type of testing doesn't gain us a
> >> lot, and
> >> is complicated to set up. I think it will be easier to test network
> >> errors
> >> and retry behavior in unit tests for the IO transforms.
> >>
> >> Part of the way to handle this is to separate out the read code from the
> >> network code (eg. bigtable has BigtableService). If you put the "handle
> >> errors/retry logic" code in a separate MySourceService class, you can
> >> test
> >> MySourceService on the wide variety of networks errors/data store
> >> problems,
> >> and then your main IO transform tests focus on the read behavior and
> >> handling the small set of errors the MySourceService class will return.
> >>
> >> I also think we should focus on testing the IO Transform, not the data
> >> store - if we kill a node in a data store, it's that data store's
> >> problem,
> >> not beam's problem. As you were pointing out, there are a *large*
> >> number of
> >> possible ways that a particular data store can fail, and we would like
> to
> >> support many different data stores. Rather than try to test that each
> >> data
> >> store behaves well, we should ensure that we handle generic/expected
> >> errors
> >> in a graceful manner.
> >>
> >>
> >>
> >>
> >>
> >>
> >> Ismaeal had a couple other quick comments/questions, I'll answer here -
> >> We can use this to test other runners running on multiple machines - I
> >> agree. This is also necessary for a good performance benchmark test.
> >>
> >> "providing the test machines to mount the cluster" - we can discuss this
> >> further, but one possible option is that google may be willing to donate
> >> something to support this.
> >>
> >> "IO Consistency" - let's follow up on those questions in another thread.
> >> That's as much about the public interface we provide to users as
> anything
> >> else. I agree with your sentiment that a user should be able to expect
> >> predictable behavior from the different IO transforms.
> >>
> >> Thanks for everyone's questions/comments - I really am excited to see
> >> that
> >> people care about this :)
> >>
> >> Stephen
> >>
> >> On Tue, Nov 22, 2016 at 7:59 AM Ismaël Mejía <[email protected]> wrote:
> >>
> >>> ​Hello,
> >>>
> >>> @Stephen Thanks for your proposal, it is really interesting, I would
> >>> really
> >>> like to help with this. I have never played with Kubernetes but this
> >>> seems
> >>> a really nice chance to do something useful with it.
> >>>
> >>> We (at Talend) are testing most of the IOs using simple container
> images
> >>> and in some particular cases ‘clusters’ of containers using
> >>> docker-compose
> >>> (a little bit like Amit’s (2) proposal). It would be really nice to
> have
> >>> this at the Beam level, in particular to try to test more complex
> >>> semantics, I don’t know how programmable kubernetes is to achieve
> >>> this for
> >>> example:
> >>>
> >>> Let’s think we have a cluster of Cassandra or Kafka nodes, I would
> >>> like to
> >>> have programmatic tests to simulate failure (e.g. kill a node), or
> >>> simulate
> >>> a really slow node, to ensure that the IO behaves as expected in the
> >>> Beam
> >>> pipeline for the given runner.
> >>>
> >>> Another related idea is to improve IO consistency: Today the
> >>> different IOs
> >>> have small differences in their failure behavior, I really would like
> >>> to be
> >>> able to predict with more precision what will happen in case of errors,
> >>> e.g. what is the correct behavior if I am writing to a Kafka node and
> >>> there
> >>> is a network partition, does the Kafka sink retries or no ? and what
> >>> if it
> >>> is the JdbcIO ?, will it work the same e.g. assuming checkpointing?
> >>> Or do
> >>> we guarantee exactly once writes somehow?, today I am not sure about
> >>> what
> >>> happens (or if the expected behavior depends on the runner), but well
> >>> maybe
> >>> it is just that I don’t know and we have tests to ensure this.
> >>>
> >>> Of course both are really hard problems, but I think with your
> >>> proposal we
> >>> can try to tackle them, as well as the performance ones. And apart of
> >>> the
> >>> data stores, I think it will be also really nice to be able to test the
> >>> runners in a distributed manner.
> >>>
> >>> So what is the next step? How do you imagine such integration tests?
> >>> ? Who
> >>> can provide the test machines so we can mount the cluster?
> >>>
> >>> Maybe my ideas are a bit too far away for an initial setup, but it
> >>> will be
> >>> really nice to start working on this.
> >>>
> >>> Ismael​
> >>>
> >>>
> >>> On Tue, Nov 22, 2016 at 11:00 AM, Amit Sela <[email protected]>
> >>> wrote:
> >>>
> >>>> Hi Stephen,
> >>>>
> >>>> I was wondering about how we plan to use the data stores across
> >>> executions.
> >>>> Clearly, it's best to setup a new instance (container) for every test,
> >>>> running a "standalone" store (say HBase/Cassandra for example), and
> >>>> once
> >>>> the test is done, teardown the instance. It should also be agnostic to
> >>> the
> >>>> runtime environment (e.g., Docker on Kubernetes).
> >>>> I'm wondering though what's the overhead of managing such a deployment
> >>>> which could become heavy and complicated as more IOs are supported and
> >>> more
> >>>> test cases introduced.
> >>>>
> >>>> Another way to go would be to have small clusters of different data
> >>> stores
> >>>> and run against new "namespaces" (while lazily evicting old ones),
> >>>> but I
> >>>> think this is less likely as maintaining a distributed instance (even
> a
> >>>> small one) for each data store sounds even more complex.
> >>>>
> >>>> A third approach would be to to simply have an "embedded" in-memory
> >>>> instance of a data store as part of a test that runs against it
> >>>> (such as
> >>> an
> >>>> embedded Kafka, though not a data store).
> >>>> This is probably the simplest solution in terms of orchestration,
> >>>> but it
> >>>> depends on having a proper "embedded" implementation for an IO.
> >>>>
> >>>> Does this make sense to you ? have you considered it ?
> >>>>
> >>>> Thanks,
> >>>> Amit
> >>>>
> >>>> On Tue, Nov 22, 2016 at 8:20 AM Jean-Baptiste Onofré <[email protected]
> >
> >>>> wrote:
> >>>>
> >>>>> Hi Stephen,
> >>>>>
> >>>>> as already discussed a bit together, it sounds great ! Especially I
> >>> like
> >>>>> it as a both integration test platform and good coverage for IOs.
> >>>>>
> >>>>> I'm very late on this but, as said, I will share with you my Marathon
> >>>>> JSON and Mesos docker images.
> >>>>>
> >>>>> By the way, I started to experiment a bit kubernetes and swamp but
> >>>>> it's
> >>>>> not yet complete. I will share what I have on the same github repo.
> >>>>>
> >>>>> Thanks !
> >>>>> Regards
> >>>>> JB
> >>>>>
> >>>>> On 11/16/2016 11:36 PM, Stephen Sisk wrote:
> >>>>>> Hi everyone!
> >>>>>>
> >>>>>> Currently we have a good set of unit tests for our IO Transforms -
> >>>> those
> >>>>>> tend to run against in-memory versions of the data stores. However,
> >>>> we'd
> >>>>>> like to further increase our test coverage to include running them
> >>>>> against
> >>>>>> real instances of the data stores that the IO Transforms work
> against
> >>>>> (e.g.
> >>>>>> cassandra, mongodb, kafka, etc…), which means we'll need to have
> real
> >>>>>> instances of various data stores.
> >>>>>>
> >>>>>> Additionally, if we want to do performance regression detection,
> it's
> >>>>>> important to have instances of the services that behave
> >>> realistically,
> >>>>>> which isn't true of in-memory or dev versions of the services.
> >>>>>>
> >>>>>>
> >>>>>> Proposed solution
> >>>>>> -------------------------
> >>>>>> If we accept this proposal, we would create an infrastructure for
> >>>> running
> >>>>>> real instances of data stores inside of containers, using container
> >>>>>> management software like mesos/marathon, kubernetes, docker swarm,
> >>> etc…
> >>>>> to
> >>>>>> manage the instances.
> >>>>>>
> >>>>>> This would enable us to build integration tests that run against
> >>> those
> >>>>> real
> >>>>>> instances and performance tests that run against those real
> instances
> >>>>> (like
> >>>>>> those that Jason Kuster is proposing elsewhere.)
> >>>>>>
> >>>>>>
> >>>>>> Why do we need one centralized set of instances vs just having
> >>> various
> >>>>>> people host their own instances?
> >>>>>> -------------------------
> >>>>>> Reducing flakiness of tests is key. By not having dependencies from
> >>> the
> >>>>>> core project on external services/instances of data stores we have
> >>>>>> guaranteed access to the services and the group can fix issues that
> >>>>> arise.
> >>>>>> An exception would be something that has an ops team supporting it
> >>> (eg,
> >>>>>> AWS, Google Cloud or other professionally managed service) - those
> we
> >>>>> trust
> >>>>>> will be stable.
> >>>>>>
> >>>>>>
> >>>>>> There may be a lot of different data stores needed - how will we
> >>>> maintain
> >>>>>> them?
> >>>>>> -------------------------
> >>>>>> It will take work above and beyond that of a normal set of unit
> tests
> >>>> to
> >>>>>> build and maintain integration/performance tests & their data store
> >>>>>> instances.
> >>>>>>
> >>>>>> Setup & maintenance of the data store containers and data store
> >>>> instances
> >>>>>> on it must be automated. It also has to be as simple of a setup as
> >>>>>> possible, and we should avoid hand tweaking the containers -
> >>> expecting
> >>>>>> checked in scripts/dockerfiles is key.
> >>>>>>
> >>>>>> Aligned with the community ownership approach of Apache, as members
> >>> of
> >>>>> the
> >>>>>> community are excited to contribute & maintain those tests and the
> >>>>>> integration/performance tests, people will be able to step up and do
> >>>>> that.
> >>>>>> If there is no longer support for maintaining a particular set of
> >>>>>> integration & performance tests and their data store instances, then
> >>> we
> >>>>> can
> >>>>>> disable those tests. We may document on the website what IO
> >>> Transforms
> >>>>> have
> >>>>>> current integration/performance tests so users know what level of
> >>>> testing
> >>>>>> the various IO Transforms have.
> >>>>>>
> >>>>>>
> >>>>>> What about requirements for the container management software
> itself?
> >>>>>> -------------------------
> >>>>>> * We should have the data store instances themselves in Docker.
> >>> Docker
> >>>>>> allows new instances to be spun up in a quick, reproducible way and
> >>> is
> >>>>>> fairly platform independent. It has wide support from a variety of
> >>>>>> different container management services.
> >>>>>> * As little admin work required as possible. Crashing instances
> >>> should
> >>>> be
> >>>>>> restarted, setup should be simple, everything possible should be
> >>>>>> scripted/scriptable.
> >>>>>> * Logs and test output should be on a publicly available website,
> >>>> without
> >>>>>> needing to log into test execution machine. Centralized capture of
> >>>>>> monitoring info/logs from instances running in the containers would
> >>>>> support
> >>>>>> this. Ideally, this would just be supported by the container
> software
> >>>> out
> >>>>>> of the box.
> >>>>>> * It'd be useful to have good persistent volume in the container
> >>>>> management
> >>>>>> software so that databases don't have to reload large data sets
> every
> >>>>> time.
> >>>>>> * The containers may be a place to execute runners themselves if we
> >>>> need
> >>>>>> larger runner instances, so it should play well with Spark, Flink,
> >>> etc…
> >>>>>> As I discussed earlier on the mailing list, it looks like hosting
> >>>> docker
> >>>>>> containers on kubernetes, docker swarm or mesos+marathon would be a
> >>>> good
> >>>>>> solution.
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Stephen Sisk
> >>>>>>
> >>>>> --
> >>>>> Jean-Baptiste Onofré
> >>>>> [email protected]
> >>>>> http://blog.nanthrax.net
> >>>>> Talend - http://www.talend.com
> >>>>>
> >
>
> --
> Jean-Baptiste Onofré
> [email protected]
> http://blog.nanthrax.net
> Talend - http://www.talend.com
>
>

Reply via email to