Re: [DISCUSS] KIP-456: Helper classes to make it simpler to write test logic with TopologyTestDriver

Jukka Karvanen Tue, 07 May 2019 01:53:05 -0700

Thanks John for feedback.

General comment first, which affect many of the individual answers.
My current approach has been abstracting the current functionality of
TopologyTestDriver and ConsumerRecordFactory
and that way the Contructors and pipe-methods signatures of TestInputTopic are
replicated from ConsumerRecordFactory.
Main addition are:
-constructor versions with serde instead of serializer / deserializer
-TestInputTopic possibility to feed in Value list instead of only
KeyValueList
-TestOutputTopic methods to unwrap  ProducerRecord and return collections
of unwrapped key values and values


The JavaDoc of current implementation can be found:
https://jukkakarvanen.github.io/kafka-streams-test-topics/
The classes and methods are identical even the package name is different.
Also these JavaDoc comments are replicating a lot of descriptions from
existing class methods JavaDocs..

I was planning to publish these classes as separate artifact so these can
be used also with older Kafka versions.
In my test with Kafka version 1.1, only the method including Headers
failed, all other functionality was working without modifications.


>InputTopic:
>
>1. Have you considered adding a ProducerRecord input method to the input
>topic? This might be unnecessary, given the overloads you provide. I'm
>wondering if it decreases the domain mismatch between TopologyTestDriver
>and KafkaStreams, though, since in production code, you send input to the
>app as a ProducerRecord, via a topic. Also, this might let you drop some of
>the less mainstream overloads, like the ones for headers.

Ok, so not same methods as in TopologyTestDriver:
pipeInput(ConsumerRecord<byte[], byte[]> consumerRecord);
pipeInput(List<ConsumerRecord<byte[], byte[]>> records);
but instead:
pipeInput(ProducerRecord <K, V]> record);
pipeInput(List< ProducerRecord  <K, V]>> records);
Which would convert ProductRecords to ConsumerRecords before piping to TTD

and Drop these methods
    void pipeInput(K key, V value, Headers headers);
    void pipeInput(K key, V value, Headers headers, long timestampMs);

In this case you would not able to create those with ConsumerRecordFactory,
but needing to create  ProducerRecord   objects directly.one by one or with
some helper methods.

Generally I was hoping that ProducerRecord and ConsumerRecord would have
inherited to same base class or implemented some kind KafkaRecord
interface, but
it is not the case now. So in the most of  case it would have enough to
pass KafkaRecord instead of ProducerRecord like in OutputVerifier now.

What do you think?

>2. On the "pipeList" methods, it seems like the "start,advance" timestamp
>approach forces me to do a little mental math, if I actually want to
>achieve some specific timestamp per record, or to be able to verify the
>result, given specific timestamps as input. Did you consider a
>KeyValueTimestamp value type instead? Alternatively, if you like the
>ProducerRecord approach, above, you could lean on that instead.

This start + advance concept is coming directly from ConsumerRecordFactory.
I don't see value of adding KeyValueTimestamp class.

My own approach has been piping in the seeds with list and pipe special
times with:
void pipeInput(K key, V value, long timestampMs);

This start + n*interval approach will fit the normal happy case scenarios
and
for timing relating test special cases I have used pipe single record at a
time.

And that ProducerRecord approach is one alternative, but you end up
generating that list some special methods anyway,
which could also pipe those in one by one.

>
>3. I wasn't clear on the semantics of the constructors that take a start
>timestamp, but no advance time. I also wasn't clear on the semantics when
>the constructor specifies start/advance, but then we also call the input
>methods that specify timestamps, or start/advance timestamps. Also related,
>what's the "default" timestamp, if no start is specified, "zero" or "now"
>both seem reasonable. Similar with the advance, "1ms" and "0ms" both seem
>reasonable defaults.

This is also based on the implementation of ConsumerRecordFactory, which
seems to
set the base time with System.currentTimeMillis() if not set,advanceMs is 0
and
you can overwrite these timestamp providing it with the method call.

>
>OutputTopic:
>
>4. Tentatively, ProducerRecord seems like a strange output type, since as a
>user, I'm "consuming" the results. How about using ConsumerRecord instead?

This is also directly from TTD. Also for this that KafkaRecord concept
would be better, but would
require alignment with existing classes, too.

>
>5. We have methods above for producing with specific timestamps, but none
>for observing the timestamps. How can we strengthen the symmetry?

My own experience with testing with TTD.
Those timestamp are important when feeding in the data, so can test for
example
timewindows and joins with left side or right side arriving first.

For the result verification, I have not so far checked the Kafka timestamps.
Timewindow related information is available in the window key and the joins
the output need to be verified, not actual timestamp of it.
So for those case where you need it, those could be checked with
fetching ProducerRecord
one by one.

What do you think?

>
>6. (just a comment) I like the readToMap method. Thanks!
>
Good.

>7. I know it clashes somewhat with the Kafka semantics, but I'm a little
>concerned about null semantics in the output topic. In particular,
>"readValue()" returning null ambiguously indicates both "there is no value"
>and "there is a null value present". Can we consider throwing a
>NoSuchElementException when you attempt to read, but there is nothing
there?

Yes, that readValue with null, is little confusing.
If you need to check value equal null, you need to fetch KeyValue with
readKeyValue instead.
Does it solve the same problem?

Also what I noticed from the current implementation of TTD, that if you
pipe in to non-existing topic,
it generated Exception, but reading from non-existing topic return only
null.

So getting null from readValue can mean you have mistyped the topic name,
there are no record to consumer or value of it is null.

>
>7.5. This would necessitate adding a boolean method to query the presence
>of output records, which can be a handy way to cap off a test:
>`assertFalse(outputTopic.hasRecords(),
>outputTopic.readKeyValuesToList().toString())` would fail and print out the
>remaining records if there are any.

Yes, I have thinking about the same, but so far I have used something like

assertThat(outputTopic.readRecord(), nullValue());

which will output the next record if it is failing.

>
>General:
>
>8. Can the TTD, input, and output topics implement AutoCloseable, to
>facilitate try-with-resources in tests?
>
>Example:
>try (driver = new TTD(), input = new TestInputTopic(), output = new
>TestOutputTopic() ) {
> ...
>} // ensures everything is closed
>
In the current implementation Topics does not have any state and do not
need the closing.


>7. Should we add some assertion that all the output is consumed when you
>call testDriver.close() ?

Not by default at least. There are scenarios where you validate only
OutputTopic even the stream is generating others.
Maybe some kind of optionally called boolean method to check all topics in
TTD might be good.
>
>8. Should we consider deprecating (some or all) of the overlapping
>mechanisms in TopologyTestDriver? It seems like this might improve
>usability by reducing ambiguity.
The most of the methods do not overlap, only those Consumer/ProducerRecord
based ones, but
the problem with the current implementation is you can deprecate those
directly, because they are used by these classes.
So there would be need add some other non public methods to be used by
Topic classes in this case.
Also relating to it, these are now in org.apache.kafka.streams.test package
not in same package as TTD in org.apache.kafka.streams.

>
>9. Can you give us some idea of the javadoc for each of the new methods
>you're proposing? The documentation is also part of the public API, and it
>also helps us understand the semantics of the operations you're
proposing.
>

My current version of JavaDoc can be found, here:
https://jukkakarvanen.github.io/kafka-streams-test-topics/

Also I made proposal for Developer Guide Testing changes to
docs/streams/developer-guide/testing.html
<https://github.com/apache/kafka/compare/trunk...jukkakarvanen:KAFKA-8233_InputOutputTopics#diff-3e6e61eaf18f0d35cdb94ed705581b55>
:
https://github.com/apache/kafka/compare/trunk...jukkakarvanen:KAFKA-8233_InputOutputTopics

(Not sure is there better way to check those, than in Github diff.)

So, thanks for the feedback and I am happy to change those things you see
viable after my comments.

Jukka

ma 6. toukok. 2019 klo 19.41 John Roesler (j...@confluent.io) kirjoitti:

> Thanks for the KIP, Jukka! Sorry it took so long for me to review it.
>
> Just a few questions, in no particular order:
>
> InputTopic:
>
> 1. Have you considered adding a ProducerRecord input method to the input
> topic? This might be unnecessary, given the overloads you provide. I'm
> wondering if it decreases the domain mismatch between TopologyTestDriver
> and KafkaStreams, though, since in production code, you send input to the
> app as a ProducerRecord, via a topic. Also, this might let you drop some of
> the less mainstream overloads, like the ones for headers.
>
> 2. On the "pipeList" methods, it seems like the "start,advance" timestamp
> approach forces me to do a little mental math, if I actually want to
> achieve some specific timestamp per record, or to be able to verify the
> result, given specific timestamps as input. Did you consider a
> KeyValueTimestamp value type instead? Alternatively, if you like the
> ProducerRecord approach, above, you could lean on that instead.
>
> 3. I wasn't clear on the semantics of the constructors that take a start
> timestamp, but no advance time. I also wasn't clear on the semantics when
> the constructor specifies start/advance, but then we also call the input
> methods that specify timestamps, or start/advance timestamps. Also related,
> what's the "default" timestamp, if no start is specified, "zero" or "now"
> both seem reasonable. Similar with the advance, "1ms" and "0ms" both seem
> reasonable defaults.
>
> OutputTopic:
>
> 4. Tentatively, ProducerRecord seems like a strange output type, since as a
> user, I'm "consuming" the results. How about using ConsumerRecord instead?
>
> 5. We have methods above for producing with specific timestamps, but none
> for observing the timestamps. How can we strengthen the symmetry?
>
> 6. (just a comment) I like the readToMap method. Thanks!
>
> 7. I know it clashes somewhat with the Kafka semantics, but I'm a little
> concerned about null semantics in the output topic. In particular,
> "readValue()" returning null ambiguously indicates both "there is no value"
> and "there is a null value present". Can we consider throwing a
> NoSuchElementException when you attempt to read, but there is nothing
> there?
>
> 7.5. This would necessitate adding a boolean method to query the presence
> of output records, which can be a handy way to cap off a test:
> `assertFalse(outputTopic.hasRecords(),
> outputTopic.readKeyValuesToList().toString())` would fail and print out the
> remaining records if there are any.
>
> General:
>
> 8. Can the TTD, input, and output topics implement AutoCloseable, to
> facilitate try-with-resources in tests?
>
> Example:
> try (driver = new TTD(), input = new TestInputTopic(), output = new
> TestOutputTopic() ) {
>  ...
> } // ensures everything is closed
>
> 7. Should we add some assertion that all the output is consumed when you
> call testDriver.close() ?
>
> 8. Should we consider deprecating (some or all) of the overlapping
> mechanisms in TopologyTestDriver? It seems like this might improve
> usability by reducing ambiguity.
>
> 9. Can you give us some idea of the javadoc for each of the new methods
> you're proposing? The documentation is also part of the public API, and it
> also helps us understand the semantics of the operations you're proposing.
>
> That's all! Thanks so much for this awesome proposal,
> -John
>
> On Mon, May 6, 2019 at 6:58 AM Jukka Karvanen <jukka.karva...@jukinimi.com
> >
> wrote:
>
> > Hi,
> >
> > Now everything what I planned to add including tests, samples and
> document
> > changes are in my branch:
> >
> >
> https://github.com/jukkakarvanen/kafka/compare/trunk...jukkakarvanen:KAFKA-8233_InputOutputTopics
> >
> >
> > So I can create PR as soon as this KIP is getting green light to proceed.
> >
> > Jukka
> >
> > la 4. toukok. 2019 klo 9.05 Jukka Karvanen (jukka.karva...@jukinimi.com)
> > kirjoitti:
> >
> > > Hi,
> > >
> > > New TestInputTopic and TestOutputTopic included to Developer guide
> > testing
> > > page as alternative,
> > > The old way with ConsumerRecordFactory and OutputVerifier is not
> removed.
> > >
> > > You can see the proposal here in my branch:
> > >
> > >
> >
> https://github.com/jukkakarvanen/kafka/compare/trunk...jukkakarvanen:KAFKA-8233_InputOutputTopics
> > >
> > >
> > > I can create Work In progress pull request if that make commenting
> > > proposal easier.
> > > Still planning to add full coverage unit test and sample
> > WordCountDemoTest to
> > >
> >
> streams/examples/src/test/java/org/apache/kafka/streams/examples/wordcount,
> > > if this KIP is accepted.
> > >
> > > Jukka
> > >
> > >
> > > ti 30. huhtik. 2019 klo 13.59 Matthias J. Sax (matth...@confluent.io)
> > > kirjoitti:
> > >
> > >> KIP-451 was discarded in favor this this KIP. So it seems we are all
> on
> > >> the same page.
> > >>
> > >>
> > >> >> Relating to KIP-448.
> > >> >> What kind of alignment did you think about?
> > >>
> > >> Nothing in particular. Was more or less a random though. Maybe there
> is
> > >> nothing to be aligned. Just wanted to bring it up. :)
> > >>
> > >>
> > >> >> Some thoughts after reading also the comments in KAFKA-6460:
> > >> >> To my understand these KIP-448 mock classes need to be fed somehow
> > into
> > >> >> TopologyTestDriver.
> > >> >> I don't know how those KIP-448 mock are planned to be set to
> > >> >> TopologyTestDriver.
> > >>
> > >> KIP-448 is still quite early, and it's a little unclear... Maybe we
> > >> should just ignore it for now. Sorry for the distraction with my
> comment
> > >> about it.
> > >>
> > >>
> > >> Please give me some more time to review this KIP in detail and I will
> > >> follow up later again.
> > >>
> > >>
> > >> -Matthias
> > >>
> > >> On 4/26/19 2:25 PM, Jukka Karvanen wrote:
> > >> > Yes, my understanding was also that this KIP cover all the
> requirement
> > >> of
> > >> > KIP-451.
> > >> >
> > >> > Relating to KIP-448.
> > >> > What kind of alignment did you think about?
> > >> >
> > >> > Some thoughts after reading also the comments in KAFKA-6460:
> > >> > To my understand these KIP-448 mock classes need to be fed somehow
> > into
> > >> > TopologyTestDriver.
> > >> > I don't know how those KIP-448 mock are planned to be set to
> > >> > TopologyTestDriver.
> > >> >
> > >> > On contrast KIP-456 was planned to be on top of unmodified
> > >> > TopologyTestDriver and now driver is set to TestInputTopic and
> > >> > TestOutputTopic in constructor.
> > >> > There are also alternative that these Topic object could be get from
> > >> > TopologyTestDriver, but it would require the duplicating the
> > >> constructors
> > >> > of Topics as methods to
> > >> > TopologyTestDriver.
> > >> >
> > >> > Also related to those Store object when going through the methods in
> > >> > TopologyTestDriver I noticed accessing the state stores could be be
> > the
> > >> > third candidate for helper class or a group of classes.
> > >> > So addition to have TestInputTopic and TestOutputTopic, it could be
> > also
> > >> > TestKeyValueStore, TestWindowStore, ... to follow the logic in this
> > >> > KPI-456, but
> > >> > it does add not any functionality on top of .already existing
> > >> functionality
> > >> > *Store classes. So that's why I did not include those.
> > >> >
> > >> > Jukka
> > >> > -
> > >> >
> > >> >
> > >> >
> > >> >
> > >> >
> > >> > Not
> > >> >
> > >> > pe 26. huhtik. 2019 klo 12.03 Matthias J. Sax (
> matth...@confluent.io)
> > >> > kirjoitti:
> > >> >
> > >> >> Btw: there is also KIP-448. I was wondering if it might be required
> > or
> > >> >> helpful to align the design of both with each other. Thoughts?
> > >> >>
> > >> >>
> > >> >>
> > >> >> On 4/25/19 11:22 PM, Matthias J. Sax wrote:
> > >> >>> Thanks for the KIP!
> > >> >>>
> > >> >>> I was just comparing this KIP with KIP-451 (even if I did not yet
> > >> make a
> > >> >>> sorrow read over this KIP), and I agree that there is a big
> overlap.
> > >> It
> > >> >>> seems that KIP-456 might subsume KIP-451.
> > >> >>>
> > >> >>> Let's wait on Patrick's input to see how to proceed.
> > >> >>>
> > >> >>>
> > >> >>> -Matthias
> > >> >>>
> > >> >>> On 4/25/19 12:03 AM, Jukka Karvanen wrote:
> > >> >>>> Hi,
> > >> >>>>
> > >> >>>> If you want to see or test the my current idea of the
> > implementation
> > >> of
> > >> >>>> this KIP, you can check it out in my repo:
> > >> >>>>
> > >> >>
> > >>
> >
> https://github.com/jukkakarvanen/kafka/compare/trunk...jukkakarvanen:KAFKA-8233_InputOutputTopics
> > >> >>>>
> > >> >>>>
> > >> >>>> After my test with KPI-451  I do not see need for add methods for
> > >> >>>> Iterables, but waiting Patrick's clarification of the use case.
> > >> >>>>
> > >> >>>> Jukka
> > >> >>>>
> > >> >>>>
> > >> >>>> ti 23. huhtik. 2019 klo 15.37 Jukka Karvanen (
> > >> >> jukka.karva...@jukinimi.com)
> > >> >>>> kirjoitti:
> > >> >>>>
> > >> >>>>> Hi All,
> > >> >>>>>
> > >> >>>>> I would like to start the discussion on KIP-456: Helper classes
> to
> > >> >> make it
> > >> >>>>> simpler to write test logic with TopologyTestDriver:
> > >> >>>>>
> > >> >>>>>
> > >> >>>>>
> > >> >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-456%3A+Helper+classes+to+make+it+simpler+to+write+test+logic+with+TopologyTestDriver
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> There is also related KIP adding methods to TopologyTestDriver:
> > >> >>>>>
> > >> >>>>>
> > >> >>
> > >>
> >
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-451%3A+Make+TopologyTestDriver+output+iterable
> > >> >>>>>
> > >> >>>>>
> > >> >>>>> I added those new Iterable based methods to this TestOutputTopic
> > >> even
> > >> >> not
> > >> >>>>> tested those myself yet.
> > >> >>>>> So this version contains both my original List and Map based
> > methods
> > >> >> and
> > >> >>>>> these new one.
> > >> >>>>> Based on the discussion some of these can be dropped, if those
> are
> > >> >> seen as
> > >> >>>>> redundant.
> > >> >>>>>
> > >> >>>>> Best Regards,
> > >> >>>>> Jukka
> > >> >>>>>
> > >> >>>>>
> > >> >>>>
> > >> >>>
> > >> >>
> > >> >>
> > >> >
> > >>
> > >>
> >
>

Re: [DISCUSS] KIP-456: Helper classes to make it simpler to write test logic with TopologyTestDriver

Reply via email to