Re: Automated Testing w/ Kafka Streams

2016-08-16 Thread Guozhang Wang
About moving some streams text utils into a separate package: I think this
has been requested before with a filed JIRA

https://issues.apache.org/jira/browse/KAFKA-3625


Guozhang

On Tue, Aug 16, 2016 at 10:18 AM, Michael Noll  wrote:

> Addendum:
>
> > Unfortunately, Apache Kafka does not publish these testing facilities as
> maven artifacts -- that's why everyone is rolling their own.
>
> Some testing facilities (like kafka.utils.TestUtils) are published via
> maven, but other helpful testing facilities are not.
>
> Since Radek provided a snippet how to pull in the artifact that includes
> k.u.TestUtils, here's the same snippet for Maven/pom.xml, with dependency
> scope set to `test`:
>
>   
>   org.apache.kafka
>   kafka_2.11
>   0.10.0.0
>   test
>   test
>   
>
>
>
> On Tue, Aug 16, 2016 at 7:14 PM, Michael Noll 
> wrote:
>
> > Mathieu,
> >
> > FWIW here are some pointers to run embedded Kafka/ZK instances for
> > integration testing.  The second block of references below uses Curator's
> > TestingServer for running embedded ZK instances.  See also the relevant
> > pom.xml for how the integration tests are being run (e.g. disabled JVM
> > reusage to ensure test isolation).
> >
> > Unfortunately, Apache Kafka does not publish these testing facilities as
> > maven artifacts -- that's why everyone is rolling their own.
> >
> > In Apache Kafka:
> >
> > Helper classes (e.g. embedded Kafka)
> > https://github.com/apache/kafka/tree/trunk/streams/src/
> > test/java/org/apache/kafka/streams/integration/utils
> >
> > Integration test example:
> > https://github.com/apache/kafka/blob/trunk/streams/src/
> > test/java/org/apache/kafka/streams/integration/
> FanoutIntegrationTest.java
> >
> > Also, for kafka.utils.TestUtils usage:
> > https://github.com/apache/kafka/blob/trunk/core/src/
> > test/scala/integration/kafka/api/IntegrationTestHarness.scala
> >
> > In confluentinc/examples:
> >
> > Helper classes (e.g. embedded Kafka, embedded Confluent Schema
> > Registry for Avro testing)
> > https://github.com/confluentinc/examples/tree/
> > kafka-0.10.0.0-cp-3.0.0/kafka-streams/src/test/java/io/
> > confluent/examples/streams/kafka
> >
> > Some more sophisticated integration tests:
> > https://github.com/confluentinc/examples/blob/
> > kafka-0.10.0.0-cp-3.0.0/kafka-streams/src/test/java/io/
> > confluent/examples/streams/WordCountLambdaIntegrationTest.java
> > https://github.com/confluentinc/examples/blob/
> > kafka-0.10.0.0-cp-3.0.0/kafka-streams/src/test/java/io/
> > confluent/examples/streams/SpecificAvroIntegrationTest.java
> >
> > Best,
> > Michael
> >
> >
> >
> >
> > On Tue, Aug 16, 2016 at 3:36 PM, Mathieu Fenniak <
> > mathieu.fenn...@replicon.com> wrote:
> >
> >> Hi Radek,
> >>
> >> No, I'm not familiar with these tools.  I see that Curator's
> TestingServer
> >> looks pretty straight-forward, but, I'm not really sure what
> >> kafka.util.TestUtils
> >> is.  I can't find any documentation referring to this, and it doesn't
> seem
> >> to be a part of any published maven artifacts in the Kafka project; can
> >> you
> >> point me at what you're using a little more specifically?
> >>
> >> Mathieu
> >>
> >>
> >> On Mon, Aug 15, 2016 at 2:39 PM, Radoslaw Gruchalski <
> >> ra...@gruchalski.com>
> >> wrote:
> >>
> >> > Out of curiosity, are you aware of kafka.util.TestUtils and Apache
> >> Curator
> >> > TestingServer?
> >> > I’m using this successfully to test publis / consume scenarios with
> >> things
> >> > like Flink, Spark and custom apps.
> >> > What would stop you from taking the same approach?
> >> >
> >> > –
> >> > Best regards,
> >> > Radek Gruchalski
> >> > ra...@gruchalski.com
> >> >
> >> >
> >> > On August 15, 2016 at 9:41:37 PM, Mathieu Fenniak (
> >> > mathieu.fenn...@replicon.com) wrote:
> >> >
> >> > Hi Michael,
> >> >
> >> > It would definitely be an option. I am not currently doing any testing
> >> > like that; it could replace the ProcessorTopologyTestDriver-style
> >> testing
> >> > that I'd like to do, but there are some trade-offs to consider:
> >> >
> >> > - I can't do an isolated test of just the TopologyBuilder; I'd be
> >> > bringing in configuration management code (eg. configuring where to
> >> access
> >> > ZK + Kafka).
> >> > - Tests using a running Kafka server wouldn't have a clear end-point;
> if
> >> > something in the toplogy doesn't publish a message where I expected it
> >> to,
> >> > my test can only fail via a timeout.
> >> > - Tests are likely to be slower; this might not be significant, but a
> >> > small difference in test speed has a big impact in productivity after
> a
> >> > few
> >> > months of development
> >> > - Tests will be more complex & fragile; some additional component
> needs
> >> > to manage starting up that Kafka server, making sure it's ready-to-go,
> >> > running tests, and then tearing it down
> >> > - Tests will have to be cautious of state 

Re: Automated Testing w/ Kafka Streams

2016-08-16 Thread Michael Noll
Addendum:

> Unfortunately, Apache Kafka does not publish these testing facilities as
maven artifacts -- that's why everyone is rolling their own.

Some testing facilities (like kafka.utils.TestUtils) are published via
maven, but other helpful testing facilities are not.

Since Radek provided a snippet how to pull in the artifact that includes
k.u.TestUtils, here's the same snippet for Maven/pom.xml, with dependency
scope set to `test`:

  
  org.apache.kafka
  kafka_2.11
  0.10.0.0
  test
  test
  



On Tue, Aug 16, 2016 at 7:14 PM, Michael Noll  wrote:

> Mathieu,
>
> FWIW here are some pointers to run embedded Kafka/ZK instances for
> integration testing.  The second block of references below uses Curator's
> TestingServer for running embedded ZK instances.  See also the relevant
> pom.xml for how the integration tests are being run (e.g. disabled JVM
> reusage to ensure test isolation).
>
> Unfortunately, Apache Kafka does not publish these testing facilities as
> maven artifacts -- that's why everyone is rolling their own.
>
> In Apache Kafka:
>
> Helper classes (e.g. embedded Kafka)
> https://github.com/apache/kafka/tree/trunk/streams/src/
> test/java/org/apache/kafka/streams/integration/utils
>
> Integration test example:
> https://github.com/apache/kafka/blob/trunk/streams/src/
> test/java/org/apache/kafka/streams/integration/FanoutIntegrationTest.java
>
> Also, for kafka.utils.TestUtils usage:
> https://github.com/apache/kafka/blob/trunk/core/src/
> test/scala/integration/kafka/api/IntegrationTestHarness.scala
>
> In confluentinc/examples:
>
> Helper classes (e.g. embedded Kafka, embedded Confluent Schema
> Registry for Avro testing)
> https://github.com/confluentinc/examples/tree/
> kafka-0.10.0.0-cp-3.0.0/kafka-streams/src/test/java/io/
> confluent/examples/streams/kafka
>
> Some more sophisticated integration tests:
> https://github.com/confluentinc/examples/blob/
> kafka-0.10.0.0-cp-3.0.0/kafka-streams/src/test/java/io/
> confluent/examples/streams/WordCountLambdaIntegrationTest.java
> https://github.com/confluentinc/examples/blob/
> kafka-0.10.0.0-cp-3.0.0/kafka-streams/src/test/java/io/
> confluent/examples/streams/SpecificAvroIntegrationTest.java
>
> Best,
> Michael
>
>
>
>
> On Tue, Aug 16, 2016 at 3:36 PM, Mathieu Fenniak <
> mathieu.fenn...@replicon.com> wrote:
>
>> Hi Radek,
>>
>> No, I'm not familiar with these tools.  I see that Curator's TestingServer
>> looks pretty straight-forward, but, I'm not really sure what
>> kafka.util.TestUtils
>> is.  I can't find any documentation referring to this, and it doesn't seem
>> to be a part of any published maven artifacts in the Kafka project; can
>> you
>> point me at what you're using a little more specifically?
>>
>> Mathieu
>>
>>
>> On Mon, Aug 15, 2016 at 2:39 PM, Radoslaw Gruchalski <
>> ra...@gruchalski.com>
>> wrote:
>>
>> > Out of curiosity, are you aware of kafka.util.TestUtils and Apache
>> Curator
>> > TestingServer?
>> > I’m using this successfully to test publis / consume scenarios with
>> things
>> > like Flink, Spark and custom apps.
>> > What would stop you from taking the same approach?
>> >
>> > –
>> > Best regards,
>> > Radek Gruchalski
>> > ra...@gruchalski.com
>> >
>> >
>> > On August 15, 2016 at 9:41:37 PM, Mathieu Fenniak (
>> > mathieu.fenn...@replicon.com) wrote:
>> >
>> > Hi Michael,
>> >
>> > It would definitely be an option. I am not currently doing any testing
>> > like that; it could replace the ProcessorTopologyTestDriver-style
>> testing
>> > that I'd like to do, but there are some trade-offs to consider:
>> >
>> > - I can't do an isolated test of just the TopologyBuilder; I'd be
>> > bringing in configuration management code (eg. configuring where to
>> access
>> > ZK + Kafka).
>> > - Tests using a running Kafka server wouldn't have a clear end-point; if
>> > something in the toplogy doesn't publish a message where I expected it
>> to,
>> > my test can only fail via a timeout.
>> > - Tests are likely to be slower; this might not be significant, but a
>> > small difference in test speed has a big impact in productivity after a
>> > few
>> > months of development
>> > - Tests will be more complex & fragile; some additional component needs
>> > to manage starting up that Kafka server, making sure it's ready-to-go,
>> > running tests, and then tearing it down
>> > - Tests will have to be cautious of state existing in Kafka. eg. two
>> > test suites that touch the same topics could be influenced by state of a
>> > previous test. Either you take a "destroy the world" approach between
>> test
>> > cases (or test suites), which probably makes test speed much worse, or,
>> > you
>> > find another way to isolate test's state.
>> >
>> > I'd have to face all these problems at the higher level that I'm calling
>> > "systems-level tests", but, I think it would be better to do the
>> majority
>> > of the automated testing at a lower level 

Re: Automated Testing w/ Kafka Streams

2016-08-16 Thread Michael Noll
Mathieu,

FWIW here are some pointers to run embedded Kafka/ZK instances for
integration testing.  The second block of references below uses Curator's
TestingServer for running embedded ZK instances.  See also the relevant
pom.xml for how the integration tests are being run (e.g. disabled JVM
reusage to ensure test isolation).

Unfortunately, Apache Kafka does not publish these testing facilities as
maven artifacts -- that's why everyone is rolling their own.

In Apache Kafka:

Helper classes (e.g. embedded Kafka)

https://github.com/apache/kafka/tree/trunk/streams/src/test/java/org/apache/kafka/streams/integration/utils

Integration test example:

https://github.com/apache/kafka/blob/trunk/streams/src/test/java/org/apache/kafka/streams/integration/FanoutIntegrationTest.java

Also, for kafka.utils.TestUtils usage:

https://github.com/apache/kafka/blob/trunk/core/src/test/scala/integration/kafka/api/IntegrationTestHarness.scala

In confluentinc/examples:

Helper classes (e.g. embedded Kafka, embedded Confluent Schema Registry
for Avro testing)

https://github.com/confluentinc/examples/tree/kafka-0.10.0.0-cp-3.0.0/kafka-streams/src/test/java/io/confluent/examples/streams/kafka

Some more sophisticated integration tests:

https://github.com/confluentinc/examples/blob/kafka-0.10.0.0-cp-3.0.0/kafka-streams/src/test/java/io/confluent/examples/streams/WordCountLambdaIntegrationTest.java

https://github.com/confluentinc/examples/blob/kafka-0.10.0.0-cp-3.0.0/kafka-streams/src/test/java/io/confluent/examples/streams/SpecificAvroIntegrationTest.java

Best,
Michael




On Tue, Aug 16, 2016 at 3:36 PM, Mathieu Fenniak <
mathieu.fenn...@replicon.com> wrote:

> Hi Radek,
>
> No, I'm not familiar with these tools.  I see that Curator's TestingServer
> looks pretty straight-forward, but, I'm not really sure what
> kafka.util.TestUtils
> is.  I can't find any documentation referring to this, and it doesn't seem
> to be a part of any published maven artifacts in the Kafka project; can you
> point me at what you're using a little more specifically?
>
> Mathieu
>
>
> On Mon, Aug 15, 2016 at 2:39 PM, Radoslaw Gruchalski  >
> wrote:
>
> > Out of curiosity, are you aware of kafka.util.TestUtils and Apache
> Curator
> > TestingServer?
> > I’m using this successfully to test publis / consume scenarios with
> things
> > like Flink, Spark and custom apps.
> > What would stop you from taking the same approach?
> >
> > –
> > Best regards,
> > Radek Gruchalski
> > ra...@gruchalski.com
> >
> >
> > On August 15, 2016 at 9:41:37 PM, Mathieu Fenniak (
> > mathieu.fenn...@replicon.com) wrote:
> >
> > Hi Michael,
> >
> > It would definitely be an option. I am not currently doing any testing
> > like that; it could replace the ProcessorTopologyTestDriver-style
> testing
> > that I'd like to do, but there are some trade-offs to consider:
> >
> > - I can't do an isolated test of just the TopologyBuilder; I'd be
> > bringing in configuration management code (eg. configuring where to
> access
> > ZK + Kafka).
> > - Tests using a running Kafka server wouldn't have a clear end-point; if
> > something in the toplogy doesn't publish a message where I expected it
> to,
> > my test can only fail via a timeout.
> > - Tests are likely to be slower; this might not be significant, but a
> > small difference in test speed has a big impact in productivity after a
> > few
> > months of development
> > - Tests will be more complex & fragile; some additional component needs
> > to manage starting up that Kafka server, making sure it's ready-to-go,
> > running tests, and then tearing it down
> > - Tests will have to be cautious of state existing in Kafka. eg. two
> > test suites that touch the same topics could be influenced by state of a
> > previous test. Either you take a "destroy the world" approach between
> test
> > cases (or test suites), which probably makes test speed much worse, or,
> > you
> > find another way to isolate test's state.
> >
> > I'd have to face all these problems at the higher level that I'm calling
> > "systems-level tests", but, I think it would be better to do the majority
> > of the automated testing at a lower level that doesn't bring these
> > considerations into play.
> >
> > Mathieu
> >
> >
> > On Mon, Aug 15, 2016 at 12:13 PM, Michael Noll 
> > wrote:
> >
> > > Mathieu,
> > >
> > > follow-up question: Are you also doing or considering integration
> > testing
> > > by spawning a local Kafka cluster and then reading/writing to that
> > cluster
> > > (often called embedded or in-memory cluster)? This approach would be in
> > > the middle between ProcessorTopologyTestDriver (that does not spawn a
> > Kafka
> > > cluster) and your system-level testing (which I suppose is running
> > against
> > > a "real" test Kafka cluster).
> > >
> > > -Michael
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Aug 15, 2016 at 3:44 PM, Mathieu Fenniak <
> > > mathieu.fenn...@replicon.com> 

Re: Automated Testing w/ Kafka Streams

2016-08-16 Thread Mathieu Fenniak
Hi Guozhang,

Thanks for the feedback.  What would you think about
including ProcessorTopologyTestDriver in a released artifact from kafka
streams in a future release?  Or alternatively, what other approach would
you recommend to incorporating it into another project's tests?  I can copy
it wholesale into my project and it works fine, but I'll have to keep it
up-to-date by hand, which isn't ideal. :-)

Mathieu


On Mon, Aug 15, 2016 at 3:24 PM, Guozhang Wang  wrote:

> Mathieu,
>
> Your composition of Per-module Unit Tests + ProcessorTopologyTestDriver +
> System Tests looks good to me, and I agree with you that since this is part
> of your pre-commit process, which could be triggered concurrently from
> different developers / teams, EmbeddedSingleNodeKafkaCluster +
> EmbeddedZookeeper may not work best for you.
>
>
> Guozhang
>
>
> On Mon, Aug 15, 2016 at 1:39 PM, Radoslaw Gruchalski  >
> wrote:
>
> > Out of curiosity, are you aware of kafka.util.TestUtils and Apache
> Curator
> > TestingServer?
> > I’m using this successfully to test publis / consume scenarios with
> things
> > like Flink, Spark and custom apps.
> > What would stop you from taking the same approach?
> >
> > –
> > Best regards,
> > Radek Gruchalski
> > ra...@gruchalski.com
> >
> >
> > On August 15, 2016 at 9:41:37 PM, Mathieu Fenniak (
> > mathieu.fenn...@replicon.com) wrote:
> >
> > Hi Michael,
> >
> > It would definitely be an option. I am not currently doing any testing
> > like that; it could replace the ProcessorTopologyTestDriver-style
> testing
> > that I'd like to do, but there are some trade-offs to consider:
> >
> > - I can't do an isolated test of just the TopologyBuilder; I'd be
> > bringing in configuration management code (eg. configuring where to
> access
> > ZK + Kafka).
> > - Tests using a running Kafka server wouldn't have a clear end-point; if
> > something in the toplogy doesn't publish a message where I expected it
> to,
> > my test can only fail via a timeout.
> > - Tests are likely to be slower; this might not be significant, but a
> > small difference in test speed has a big impact in productivity after a
> few
> > months of development
> > - Tests will be more complex & fragile; some additional component needs
> > to manage starting up that Kafka server, making sure it's ready-to-go,
> > running tests, and then tearing it down
> > - Tests will have to be cautious of state existing in Kafka. eg. two
> > test suites that touch the same topics could be influenced by state of a
> > previous test. Either you take a "destroy the world" approach between
> test
> > cases (or test suites), which probably makes test speed much worse, or,
> you
> > find another way to isolate test's state.
> >
> > I'd have to face all these problems at the higher level that I'm calling
> > "systems-level tests", but, I think it would be better to do the majority
> > of the automated testing at a lower level that doesn't bring these
> > considerations into play.
> >
> > Mathieu
> >
> >
> > On Mon, Aug 15, 2016 at 12:13 PM, Michael Noll 
> > wrote:
> >
> > > Mathieu,
> > >
> > > follow-up question: Are you also doing or considering integration
> testing
> > > by spawning a local Kafka cluster and then reading/writing to that
> > cluster
> > > (often called embedded or in-memory cluster)? This approach would be in
> > > the middle between ProcessorTopologyTestDriver (that does not spawn a
> > Kafka
> > > cluster) and your system-level testing (which I suppose is running
> > against
> > > a "real" test Kafka cluster).
> > >
> > > -Michael
> > >
> > >
> > >
> > >
> > >
> > > On Mon, Aug 15, 2016 at 3:44 PM, Mathieu Fenniak <
> > > mathieu.fenn...@replicon.com> wrote:
> > >
> > > > Hey all,
> > > >
> > > > At my workplace, we have a real focus on software automated testing.
> > I'd
> > > > love to be able to test the composition of a TopologyBuilder with
> > > > org.apache.kafka.test.ProcessorTopologyTestDriver
> > > >  > > > 317b95efa4/streams/src/test/java/org/apache/kafka/test/
> > > > ProcessorTopologyTestDriver.java>;
> > > > has there ever been any thought given to making this part of the
> public
> > > API
> > > > of Kafka Streams?
> > > >
> > > > For some background, here are some details on the automated testing
> > plan
> > > > that I have in mind for a Kafka Streams application. Our goal is to
> > > enable
> > > > continuous deployment of any new development we do, so, it has to be
> > > > rigorously tested with complete automation.
> > > >
> > > > As part of our pre-commit testing, we'd first have these gateways; no
> > > code
> > > > would reach our master branch without passing these tests:
> > > >
> > > > - At the finest level, unit tests covering individual pieces like a
> > > > Serde, ValueMapper, ValueJoiner, aggregate adder/subtractor, etc.
> > > These
> > > > pieces are very isolated, very easy 

Re: Automated Testing w/ Kafka Streams

2016-08-16 Thread Mathieu Fenniak
Hi Radek,

No, I'm not familiar with these tools.  I see that Curator's TestingServer
looks pretty straight-forward, but, I'm not really sure what
kafka.util.TestUtils
is.  I can't find any documentation referring to this, and it doesn't seem
to be a part of any published maven artifacts in the Kafka project; can you
point me at what you're using a little more specifically?

Mathieu


On Mon, Aug 15, 2016 at 2:39 PM, Radoslaw Gruchalski 
wrote:

> Out of curiosity, are you aware of kafka.util.TestUtils and Apache Curator
> TestingServer?
> I’m using this successfully to test publis / consume scenarios with things
> like Flink, Spark and custom apps.
> What would stop you from taking the same approach?
>
> –
> Best regards,
> Radek Gruchalski
> ra...@gruchalski.com
>
>
> On August 15, 2016 at 9:41:37 PM, Mathieu Fenniak (
> mathieu.fenn...@replicon.com) wrote:
>
> Hi Michael,
>
> It would definitely be an option. I am not currently doing any testing
> like that; it could replace the ProcessorTopologyTestDriver-style testing
> that I'd like to do, but there are some trade-offs to consider:
>
> - I can't do an isolated test of just the TopologyBuilder; I'd be
> bringing in configuration management code (eg. configuring where to access
> ZK + Kafka).
> - Tests using a running Kafka server wouldn't have a clear end-point; if
> something in the toplogy doesn't publish a message where I expected it to,
> my test can only fail via a timeout.
> - Tests are likely to be slower; this might not be significant, but a
> small difference in test speed has a big impact in productivity after a
> few
> months of development
> - Tests will be more complex & fragile; some additional component needs
> to manage starting up that Kafka server, making sure it's ready-to-go,
> running tests, and then tearing it down
> - Tests will have to be cautious of state existing in Kafka. eg. two
> test suites that touch the same topics could be influenced by state of a
> previous test. Either you take a "destroy the world" approach between test
> cases (or test suites), which probably makes test speed much worse, or,
> you
> find another way to isolate test's state.
>
> I'd have to face all these problems at the higher level that I'm calling
> "systems-level tests", but, I think it would be better to do the majority
> of the automated testing at a lower level that doesn't bring these
> considerations into play.
>
> Mathieu
>
>
> On Mon, Aug 15, 2016 at 12:13 PM, Michael Noll 
> wrote:
>
> > Mathieu,
> >
> > follow-up question: Are you also doing or considering integration
> testing
> > by spawning a local Kafka cluster and then reading/writing to that
> cluster
> > (often called embedded or in-memory cluster)? This approach would be in
> > the middle between ProcessorTopologyTestDriver (that does not spawn a
> Kafka
> > cluster) and your system-level testing (which I suppose is running
> against
> > a "real" test Kafka cluster).
> >
> > -Michael
> >
> >
> >
> >
> >
> > On Mon, Aug 15, 2016 at 3:44 PM, Mathieu Fenniak <
> > mathieu.fenn...@replicon.com> wrote:
> >
> > > Hey all,
> > >
> > > At my workplace, we have a real focus on software automated testing.
> I'd
> > > love to be able to test the composition of a TopologyBuilder with
> > > org.apache.kafka.test.ProcessorTopologyTestDriver
> > >  > > 317b95efa4/streams/src/test/java/org/apache/kafka/test/
> > > ProcessorTopologyTestDriver.java>;
> > > has there ever been any thought given to making this part of the
> public
> > API
> > > of Kafka Streams?
> > >
> > > For some background, here are some details on the automated testing
> plan
> > > that I have in mind for a Kafka Streams application. Our goal is to
> > enable
> > > continuous deployment of any new development we do, so, it has to be
> > > rigorously tested with complete automation.
> > >
> > > As part of our pre-commit testing, we'd first have these gateways; no
> > code
> > > would reach our master branch without passing these tests:
> > >
> > > - At the finest level, unit tests covering individual pieces like a
> > > Serde, ValueMapper, ValueJoiner, aggregate adder/subtractor, etc.
> > These
> > > pieces are very isolated, very easy to unit test.
> > > - At a higher level, I'd like to have component tests of the
> > composition
> > > of the TopologyBuilder; this is where ProcessorTopologyTestDriver
> > would
> > > be
> > > valuable. There'd be far fewer of these tests than the lower-level
> > > tests.
> > > There are no external dependencies to these tests, so they'd be very
> > > fast.
> > >
> > > Having passed that level of testing, we'd deploy the Kafka Streams
> > > application to an integration testing area where the rest of our
> > > application is kept up-to-date, and proceed with these integration
> tests:
> > >
> > > - Systems-level tests where we synthesize inputs to the Kafka topics,
> > > wait for 

Re: Automated Testing w/ Kafka Streams

2016-08-15 Thread Guozhang Wang
Mathieu,

Your composition of Per-module Unit Tests + ProcessorTopologyTestDriver +
System Tests looks good to me, and I agree with you that since this is part
of your pre-commit process, which could be triggered concurrently from
different developers / teams, EmbeddedSingleNodeKafkaCluster +
EmbeddedZookeeper may not work best for you.


Guozhang


On Mon, Aug 15, 2016 at 1:39 PM, Radoslaw Gruchalski 
wrote:

> Out of curiosity, are you aware of kafka.util.TestUtils and Apache Curator
> TestingServer?
> I’m using this successfully to test publis / consume scenarios with things
> like Flink, Spark and custom apps.
> What would stop you from taking the same approach?
>
> –
> Best regards,
> Radek Gruchalski
> ra...@gruchalski.com
>
>
> On August 15, 2016 at 9:41:37 PM, Mathieu Fenniak (
> mathieu.fenn...@replicon.com) wrote:
>
> Hi Michael,
>
> It would definitely be an option. I am not currently doing any testing
> like that; it could replace the ProcessorTopologyTestDriver-style testing
> that I'd like to do, but there are some trade-offs to consider:
>
> - I can't do an isolated test of just the TopologyBuilder; I'd be
> bringing in configuration management code (eg. configuring where to access
> ZK + Kafka).
> - Tests using a running Kafka server wouldn't have a clear end-point; if
> something in the toplogy doesn't publish a message where I expected it to,
> my test can only fail via a timeout.
> - Tests are likely to be slower; this might not be significant, but a
> small difference in test speed has a big impact in productivity after a few
> months of development
> - Tests will be more complex & fragile; some additional component needs
> to manage starting up that Kafka server, making sure it's ready-to-go,
> running tests, and then tearing it down
> - Tests will have to be cautious of state existing in Kafka. eg. two
> test suites that touch the same topics could be influenced by state of a
> previous test. Either you take a "destroy the world" approach between test
> cases (or test suites), which probably makes test speed much worse, or, you
> find another way to isolate test's state.
>
> I'd have to face all these problems at the higher level that I'm calling
> "systems-level tests", but, I think it would be better to do the majority
> of the automated testing at a lower level that doesn't bring these
> considerations into play.
>
> Mathieu
>
>
> On Mon, Aug 15, 2016 at 12:13 PM, Michael Noll 
> wrote:
>
> > Mathieu,
> >
> > follow-up question: Are you also doing or considering integration testing
> > by spawning a local Kafka cluster and then reading/writing to that
> cluster
> > (often called embedded or in-memory cluster)? This approach would be in
> > the middle between ProcessorTopologyTestDriver (that does not spawn a
> Kafka
> > cluster) and your system-level testing (which I suppose is running
> against
> > a "real" test Kafka cluster).
> >
> > -Michael
> >
> >
> >
> >
> >
> > On Mon, Aug 15, 2016 at 3:44 PM, Mathieu Fenniak <
> > mathieu.fenn...@replicon.com> wrote:
> >
> > > Hey all,
> > >
> > > At my workplace, we have a real focus on software automated testing.
> I'd
> > > love to be able to test the composition of a TopologyBuilder with
> > > org.apache.kafka.test.ProcessorTopologyTestDriver
> > >  > > 317b95efa4/streams/src/test/java/org/apache/kafka/test/
> > > ProcessorTopologyTestDriver.java>;
> > > has there ever been any thought given to making this part of the public
> > API
> > > of Kafka Streams?
> > >
> > > For some background, here are some details on the automated testing
> plan
> > > that I have in mind for a Kafka Streams application. Our goal is to
> > enable
> > > continuous deployment of any new development we do, so, it has to be
> > > rigorously tested with complete automation.
> > >
> > > As part of our pre-commit testing, we'd first have these gateways; no
> > code
> > > would reach our master branch without passing these tests:
> > >
> > > - At the finest level, unit tests covering individual pieces like a
> > > Serde, ValueMapper, ValueJoiner, aggregate adder/subtractor, etc.
> > These
> > > pieces are very isolated, very easy to unit test.
> > > - At a higher level, I'd like to have component tests of the
> > composition
> > > of the TopologyBuilder; this is where ProcessorTopologyTestDriver
> > would
> > > be
> > > valuable. There'd be far fewer of these tests than the lower-level
> > > tests.
> > > There are no external dependencies to these tests, so they'd be very
> > > fast.
> > >
> > > Having passed that level of testing, we'd deploy the Kafka Streams
> > > application to an integration testing area where the rest of our
> > > application is kept up-to-date, and proceed with these integration
> tests:
> > >
> > > - Systems-level tests where we synthesize inputs to the Kafka topics,
> > > wait for the Streams app to process the data, and then 

Re: Automated Testing w/ Kafka Streams

2016-08-15 Thread Radoslaw Gruchalski
Out of curiosity, are you aware of kafka.util.TestUtils and Apache Curator
TestingServer?
I’m using this successfully to test publis / consume scenarios with things
like Flink, Spark and custom apps.
What would stop you from taking the same approach?

–
Best regards,
Radek Gruchalski
ra...@gruchalski.com


On August 15, 2016 at 9:41:37 PM, Mathieu Fenniak (
mathieu.fenn...@replicon.com) wrote:

Hi Michael,

It would definitely be an option. I am not currently doing any testing
like that; it could replace the ProcessorTopologyTestDriver-style testing
that I'd like to do, but there are some trade-offs to consider:

- I can't do an isolated test of just the TopologyBuilder; I'd be
bringing in configuration management code (eg. configuring where to access
ZK + Kafka).
- Tests using a running Kafka server wouldn't have a clear end-point; if
something in the toplogy doesn't publish a message where I expected it to,
my test can only fail via a timeout.
- Tests are likely to be slower; this might not be significant, but a
small difference in test speed has a big impact in productivity after a few
months of development
- Tests will be more complex & fragile; some additional component needs
to manage starting up that Kafka server, making sure it's ready-to-go,
running tests, and then tearing it down
- Tests will have to be cautious of state existing in Kafka. eg. two
test suites that touch the same topics could be influenced by state of a
previous test. Either you take a "destroy the world" approach between test
cases (or test suites), which probably makes test speed much worse, or, you
find another way to isolate test's state.

I'd have to face all these problems at the higher level that I'm calling
"systems-level tests", but, I think it would be better to do the majority
of the automated testing at a lower level that doesn't bring these
considerations into play.

Mathieu


On Mon, Aug 15, 2016 at 12:13 PM, Michael Noll 
wrote:

> Mathieu,
>
> follow-up question: Are you also doing or considering integration testing
> by spawning a local Kafka cluster and then reading/writing to that
cluster
> (often called embedded or in-memory cluster)? This approach would be in
> the middle between ProcessorTopologyTestDriver (that does not spawn a
Kafka
> cluster) and your system-level testing (which I suppose is running
against
> a "real" test Kafka cluster).
>
> -Michael
>
>
>
>
>
> On Mon, Aug 15, 2016 at 3:44 PM, Mathieu Fenniak <
> mathieu.fenn...@replicon.com> wrote:
>
> > Hey all,
> >
> > At my workplace, we have a real focus on software automated testing.
I'd
> > love to be able to test the composition of a TopologyBuilder with
> > org.apache.kafka.test.ProcessorTopologyTestDriver
> >  > 317b95efa4/streams/src/test/java/org/apache/kafka/test/
> > ProcessorTopologyTestDriver.java>;
> > has there ever been any thought given to making this part of the public
> API
> > of Kafka Streams?
> >
> > For some background, here are some details on the automated testing
plan
> > that I have in mind for a Kafka Streams application. Our goal is to
> enable
> > continuous deployment of any new development we do, so, it has to be
> > rigorously tested with complete automation.
> >
> > As part of our pre-commit testing, we'd first have these gateways; no
> code
> > would reach our master branch without passing these tests:
> >
> > - At the finest level, unit tests covering individual pieces like a
> > Serde, ValueMapper, ValueJoiner, aggregate adder/subtractor, etc.
> These
> > pieces are very isolated, very easy to unit test.
> > - At a higher level, I'd like to have component tests of the
> composition
> > of the TopologyBuilder; this is where ProcessorTopologyTestDriver
> would
> > be
> > valuable. There'd be far fewer of these tests than the lower-level
> > tests.
> > There are no external dependencies to these tests, so they'd be very
> > fast.
> >
> > Having passed that level of testing, we'd deploy the Kafka Streams
> > application to an integration testing area where the rest of our
> > application is kept up-to-date, and proceed with these integration
tests:
> >
> > - Systems-level tests where we synthesize inputs to the Kafka topics,
> > wait for the Streams app to process the data, and then inspect the
> > output
> > that it pushes into other Kafka topics. These tests will be fewer in
> > nature than the above tests, but they serve to ensure that the
> > application
> > is well-configured, executing, and handling inputs & outputs as
> > expected.
> > - UI-level tests where we verify behaviors that are expected from the
> > system as a whole. As our application is a web app, we'd be using
> > Selenium
> > to drive a web browser and verifying interactions and outputs that are
> > expected from the Streams application matching our real-world
> use-cases.
> > These tests are even fewer in nature than the above.
> >
> > This is an 

Re: Automated Testing w/ Kafka Streams

2016-08-15 Thread Mathieu Fenniak
Hi Michael,

It would definitely be an option.  I am not currently doing any testing
like that; it could replace the ProcessorTopologyTestDriver-style testing
that I'd like to do, but there are some trade-offs to consider:

   - I can't do an isolated test of just the TopologyBuilder; I'd be
   bringing in configuration management code (eg. configuring where to access
   ZK + Kafka).
   - Tests using a running Kafka server wouldn't have a clear end-point; if
   something in the toplogy doesn't publish a message where I expected it to,
   my test can only fail via a timeout.
   - Tests are likely to be slower; this might not be significant, but a
   small difference in test speed has a big impact in productivity after a few
   months of development
   - Tests will be more complex & fragile; some additional component needs
   to manage starting up that Kafka server, making sure it's ready-to-go,
   running tests, and then tearing it down
   - Tests will have to be cautious of state existing in Kafka.  eg. two
   test suites that touch the same topics could be influenced by state of a
   previous test.  Either you take a "destroy the world" approach between test
   cases (or test suites), which probably makes test speed much worse, or, you
   find another way to isolate test's state.

I'd have to face all these problems at the higher level that I'm calling
"systems-level tests", but, I think it would be better to do the majority
of the automated testing at a lower level that doesn't bring these
considerations into play.

Mathieu


On Mon, Aug 15, 2016 at 12:13 PM, Michael Noll  wrote:

> Mathieu,
>
> follow-up question:  Are you also doing or considering integration testing
> by spawning a local Kafka cluster and then reading/writing to that cluster
> (often called embedded or in-memory cluster)?  This approach would be in
> the middle between ProcessorTopologyTestDriver (that does not spawn a Kafka
> cluster) and your system-level testing (which I suppose is running against
> a "real" test Kafka cluster).
>
> -Michael
>
>
>
>
>
> On Mon, Aug 15, 2016 at 3:44 PM, Mathieu Fenniak <
> mathieu.fenn...@replicon.com> wrote:
>
> > Hey all,
> >
> > At my workplace, we have a real focus on software automated testing.  I'd
> > love to be able to test the composition of a TopologyBuilder with
> > org.apache.kafka.test.ProcessorTopologyTestDriver
> >  > 317b95efa4/streams/src/test/java/org/apache/kafka/test/
> > ProcessorTopologyTestDriver.java>;
> > has there ever been any thought given to making this part of the public
> API
> > of Kafka Streams?
> >
> > For some background, here are some details on the automated testing plan
> > that I have in mind for a Kafka Streams application.  Our goal is to
> enable
> > continuous deployment of any new development we do, so, it has to be
> > rigorously tested with complete automation.
> >
> > As part of our pre-commit testing, we'd first have these gateways; no
> code
> > would reach our master branch without passing these tests:
> >
> >- At the finest level, unit tests covering individual pieces like a
> >Serde, ValueMapper, ValueJoiner, aggregate adder/subtractor, etc.
> These
> >pieces are very isolated, very easy to unit test.
> >- At a higher level, I'd like to have component tests of the
> composition
> >of the TopologyBuilder; this is where ProcessorTopologyTestDriver
> would
> > be
> >valuable.  There'd be far fewer of these tests than the lower-level
> > tests.
> >There are no external dependencies to these tests, so they'd be very
> > fast.
> >
> > Having passed that level of testing, we'd deploy the Kafka Streams
> > application to an integration testing area where the rest of our
> > application is kept up-to-date, and proceed with these integration tests:
> >
> >- Systems-level tests where we synthesize inputs to the Kafka topics,
> >wait for the Streams app to process the data, and then inspect the
> > output
> >that it pushes into other Kafka topics.  These tests will be fewer in
> >nature than the above tests, but they serve to ensure that the
> > application
> >is well-configured, executing, and handling inputs & outputs as
> > expected.
> >- UI-level tests where we verify behaviors that are expected from the
> >system as a whole.  As our application is a web app, we'd be using
> > Selenium
> >to drive a web browser and verifying interactions and outputs that are
> >expected from the Streams application matching our real-world
> use-cases.
> >These tests are even fewer in nature than the above.
> >
> > This is an adaptation of the automated testing scaffold that we currently
> > use for microservices; I'd love any input on the plan as a whole.
> >
> > Thanks,
> >
> > Mathieu
> >
>


Re: Automated Testing w/ Kafka Streams

2016-08-15 Thread Michael Noll
Mathieu,

follow-up question:  Are you also doing or considering integration testing
by spawning a local Kafka cluster and then reading/writing to that cluster
(often called embedded or in-memory cluster)?  This approach would be in
the middle between ProcessorTopologyTestDriver (that does not spawn a Kafka
cluster) and your system-level testing (which I suppose is running against
a "real" test Kafka cluster).

-Michael





On Mon, Aug 15, 2016 at 3:44 PM, Mathieu Fenniak <
mathieu.fenn...@replicon.com> wrote:

> Hey all,
>
> At my workplace, we have a real focus on software automated testing.  I'd
> love to be able to test the composition of a TopologyBuilder with
> org.apache.kafka.test.ProcessorTopologyTestDriver
>  317b95efa4/streams/src/test/java/org/apache/kafka/test/
> ProcessorTopologyTestDriver.java>;
> has there ever been any thought given to making this part of the public API
> of Kafka Streams?
>
> For some background, here are some details on the automated testing plan
> that I have in mind for a Kafka Streams application.  Our goal is to enable
> continuous deployment of any new development we do, so, it has to be
> rigorously tested with complete automation.
>
> As part of our pre-commit testing, we'd first have these gateways; no code
> would reach our master branch without passing these tests:
>
>- At the finest level, unit tests covering individual pieces like a
>Serde, ValueMapper, ValueJoiner, aggregate adder/subtractor, etc.  These
>pieces are very isolated, very easy to unit test.
>- At a higher level, I'd like to have component tests of the composition
>of the TopologyBuilder; this is where ProcessorTopologyTestDriver would
> be
>valuable.  There'd be far fewer of these tests than the lower-level
> tests.
>There are no external dependencies to these tests, so they'd be very
> fast.
>
> Having passed that level of testing, we'd deploy the Kafka Streams
> application to an integration testing area where the rest of our
> application is kept up-to-date, and proceed with these integration tests:
>
>- Systems-level tests where we synthesize inputs to the Kafka topics,
>wait for the Streams app to process the data, and then inspect the
> output
>that it pushes into other Kafka topics.  These tests will be fewer in
>nature than the above tests, but they serve to ensure that the
> application
>is well-configured, executing, and handling inputs & outputs as
> expected.
>- UI-level tests where we verify behaviors that are expected from the
>system as a whole.  As our application is a web app, we'd be using
> Selenium
>to drive a web browser and verifying interactions and outputs that are
>expected from the Streams application matching our real-world use-cases.
>These tests are even fewer in nature than the above.
>
> This is an adaptation of the automated testing scaffold that we currently
> use for microservices; I'd love any input on the plan as a whole.
>
> Thanks,
>
> Mathieu
>