Re: [DISCUSS] KIP-451: Make TopologyTestDriver output iterable

Matthias J. Sax Thu, 25 Apr 2019 23:20:00 -0700

Thanks for the discussion!

After skipping over KIP-456, I was also wondering what the overlap of
both KIPs are, and if KIP-456 might actually subsume KIP-451.


If we all agree on this, I would recommend to discard KIP-451 in favor
of KIP-456.

I'll also follow up on KIP-456 DISCUSS thread to get some input about
the overlap from their point of view.


-Matthias

On 4/25/19 2:35 AM, Patrik Kleindl wrote:
> Hi
> 
> As discussed, if the preferred option is to consume the records always I
> will change both methods in KIP-451 accordingly and also switch them to
> return a List.
> This would be a bit redundant with Jukkas proposal in KIP-456 so the
> question is if KIP-451 should be scraped in favor of KIP-456 which has more
> powerful solution but will also need a bit more changes in tests.
> On the other hand both are useful and wouldn't conflict as far as I can see.
> 
> Any opinions?
> 
> best regards
> 
> Patrik
> 
> On Thu, 25 Apr 2019 at 08:55, Jukka Karvanen <jukka.karva...@jukinimi.com>
> wrote:
> 
>> Hi,
>>
>> I played around with Patrick's KAFKA-8200 branch and I tested it with
>> combined with my draft version of KIP-456.
>>
>> Some comments:
>> These two version of iterableOutput methods are working now differently, if
>> you reuse same fetched Iterable object after piping in new inputs.
>> Version without serde will see the new input, but version with serdes has
>> streamed the converted items already to new list and that's why
>> not seeing the new item. Maybe it is intended to to fetch new Iterable each
>> time, but the implementation is not mandating it.
>>
>> See example:
>>
>> https://github.com/jukkakarvanen/kafka/blob/KAFKA-8200withKIP-456/streams/test-utils/src/test/java/org/apache/kafka/streams/TopologyTestDriverIterableTest.java
>>
>>
>> I have a lot of tests where I pipe the list of input and check list of
>> output items, pipe more and check the new list.
>> Now with this Iterable from the beginning is not very usable if you test
>> like this in multiple batches. You need to reiterate same again.
>>
>> In KIP-456 readKeyValuesToList returns List same way ConsumerRecordFactory
>> and that way this TestInputTopic is accepting List as input.
>> This collection methods in TestOutputTopic are also consuming the messages.
>> So you can mix the reading individual rows and collections.
>> With List it is also easier to get the number of outputs compared to
>> Iterable.
>>
>> Please, check out also DISCUSSion of KIP.456. I will post there the link to
>> the current version of implementation and you can see if it fulfill also
>> your need.
>>
>> Jukka
>>
>>
>> la 20. huhtik. 2019 klo 1.11 Patrik Kleindl (pklei...@gmail.com)
>> kirjoitti:
>>
>>> Hi Matthias
>>> Seems I got a bit ahead of myself.
>>> With option C my aim was a simple alternative which gives back all output
>>> records that have happened up to this point (and which have not been
>>> removed by calls to readOutput).
>>> Based on that the user can decide how to step through or compare the
>>> records.
>>>
>>> If you see it as more consistent if the new methods removed all returned
>>> records then this can easily be done.
>>>
>>> But maybe the pick of Iterable was too narrow.
>>> It would probably be a good fit to return a List or just a Collection
>>>
>>> Picking up John's naming suggestion this would make this:
>>>
>>> public Collection<ProducerRecord<byte[], byte[]>> readAllOutput(final
>>> String topic) {
>>>     final Collection<ProducerRecord<byte[], byte[]>> outputRecords =
>>> outputRecordsByTopic.get(topic);
>>>     if (outputRecords == null) {
>>>         return Collections.emptyList();
>>>     }
>>>     outputRecordsByTopic.put(topic, new LinkedList<>());
>>>     return outputRecords;
>>> }
>>>
>>> With the semantics the same as readOutput = removing everything.
>>>
>>> Can be changed to a List if you think it matters that a user can query
>>> some index directly.
>>>
>>> What do you think?
>>>
>>> best regards
>>>
>>> Patrik
>>>
>>>
>>>
>>> On Fri, 19 Apr 2019 at 03:04, Matthias J. Sax <matth...@confluent.io>
>>> wrote:
>>>
>>>> I am not sure if (C) is the best option to pick.
>>>>
>>>> What is the reasoning to suggest (C) over the other options?
>>>>
>>>> It seems that users cannot clear buffered output using option (C). This
>>>> might it make difficult to write tests.
>>>>
>>>> The original Jira tickets suggest:
>>>>
>>>>> which returns either an iterator or list over the records that are
>>>> currently available in the topic
>>>>
>>>> This implies that the current buffer would be cleared when getting the
>>>> iterator.
>>>>
>>>> Also, from my understanding, the idea of iterating in general, is to
>>>> step through a finite collection of objects/elements. Hence, if
>>>> `hasNext()` returns `false` is will never return `true` later on.
>>>>
>>>> As John mentioned, Java also has support for streams, that offer
>>>> different semantics, that would align with option (C). However, I am
>> not
>>>> sure if this would be the test API to write tests?
>>>>
>>>> Thoughts?
>>>>
>>>> In any way: whatever semantics we pick, the KIP should explain them.
>>>> Atm, this part is missing in the KIP.
>>>>
>>>>
>>>> -Matthias
>>>>
>>>> On 4/18/19 12:47 PM, Patrik Kleindl wrote:
>>>>> Hi John
>>>>>
>>>>> Thanks for your feedback
>>>>> It's C, it does not consume the messages in contrast to the
>> readOutput.
>>>>> Is it a requirement to do so?
>>>>> That's why I picked a different name so the difference is more
>>>> noticeable.
>>>>> I will add that to the JavaDoc.
>>>>>
>>>>> I see your point regarding future changes, that's why I linked
>> KIP-456
>>>>> where such a method is proposed and would maybe allow to deprecate my
>>>>> version in favor of a bigger solution.
>>>>>
>>>>> Hope that answers your questions
>>>>>
>>>>> best regards
>>>>> Patrik
>>>>>
>>>>>
>>>>> On Thu, 18 Apr 2019 at 19:46, John Roesler <j...@confluent.io>
>> wrote:
>>>>>
>>>>>> Hi, Patrik,
>>>>>>
>>>>>> Thanks for this proposal!
>>>>>>
>>>>>> I have one question, which I didn't see addressed by the KIP.
>>> Currently,
>>>>>> when you call `readOutput`, it consumes the result (removes it from
>>> the
>>>>>> test driver's output). Does your proposed method:
>>>>>> A: consume the whole output stream for that topic "atomically" when
>> it
>>>>>> returns the iterable? (i.e., two calls in a row would guarantee the
>>>> second
>>>>>> call is always an empty iterable?)
>>>>>> B: consume each record when we iterate over it? (i.e., this is like
>> a
>>>>>> stream) If this is the case, is the returned object iterable once
>>>> (uncached
>>>>>> stream), or could we iterate over it repeatedly (cached stream)?
>>>>>> C: not consume at all? (i.e., this is a view on the output topic,
>> but
>>> we
>>>>>> need a separate method to consume/clear the output)
>>>>>> D: something else?
>>>>>>
>>>>>> Also, one suggestion: maybe name the method "readAllOutput" or
>>>> something.
>>>>>> Specifically naming it "iterable" makes it awkward if we do want to
>>>> tighten
>>>>>> the return type (e.g., to List) in the future. This is something we
>>> may
>>>>>> actually want to do, if there's an easy way to say, "assert that the
>>>> output
>>>>>> equals [...some literal list...]".
>>>>>>
>>>>>> Thanks again!
>>>>>> -John
>>>>>>
>>>>>> On Wed, Apr 17, 2019 at 4:01 PM Patrik Kleindl <pklei...@gmail.com>
>>>> wrote:
>>>>>>
>>>>>>> Hi all
>>>>>>>
>>>>>>> Unless someone has objections I will start a VOTE thread tomorrow.
>>>>>>> The KIP adds two methods to the TopologyTestDriver and has no
>>> conflicts
>>>>>> for
>>>>>>> existing users.
>>>>>>> PR https://github.com/apache/kafka/pull/6556 is already being
>>>> reviewed.
>>>>>>>
>>>>>>> Side-note:
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-456%3A+Helper+classes+to+make+it+simpler+to+write+test+logic+with+TopologyTestDriver
>>>>>>> will
>>>>>>> provide a much larger solution for the TopologyTestDriver, but is
>>> just
>>>>>>> starting the discussion.
>>>>>>>
>>>>>>> best regards
>>>>>>>
>>>>>>> Patrik
>>>>>>>
>>>>>>> On Thu, 11 Apr 2019 at 22:14, Patrik Kleindl <pklei...@gmail.com>
>>>> wrote:
>>>>>>>
>>>>>>>> Hi Matthias
>>>>>>>>
>>>>>>>> Thanks for the questions.
>>>>>>>>
>>>>>>>> Regarding the return type:
>>>>>>>> Iterable offers the option of being used in a foreach loop
>> directly
>>>> and
>>>>>>> it
>>>>>>>> gives you access to the .iterator method, too.
>>>>>>>> (ref:
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>> https://www.techiedelight.com/differences-between-iterator-and-iterable-in-java/
>>>>>>>> )
>>>>>>>>
>>>>>>>> To return a List object would require an additional conversion
>> and I
>>>>>>> don't see the immediate benefit.
>>>>>>>>
>>>>>>>> Regarding the ordering:
>>>>>>>> outputRecordsByTopic gives back a Queue
>>>>>>>>
>>>>>>>> private final Map<String, Queue<ProducerRecord<byte[], byte[]>>>
>>>>>>> outputRecordsByTopic = new HashMap<>();
>>>>>>>>
>>>>>>>> which has a LinkedList behind it
>>>>>>>>
>>>>>>>> outputRecordsByTopic.computeIfAbsent(record.topic(), k -> new
>>>>>>> LinkedList<>()).add(record);
>>>>>>>>
>>>>>>>> So the order is handled by the linked list and should not be
>>> modified
>>>>>> by
>>>>>>>> my changes,
>>>>>>>> not even the .stream.map etc. (ref:
>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>> https://stackoverflow.com/questions/30258566/java-stream-map-and-collect-order-of-resulting-container
>>>>>>>> )
>>>>>>>>
>>>>>>>>
>>>>>>>> Then again, I am open to change it if people have some strong
>>>>>> preference
>>>>>>>>
>>>>>>>> best regards
>>>>>>>>
>>>>>>>> Patrik
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, 11 Apr 2019 at 17:45, Matthias J. Sax <
>>> matth...@confluent.io>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Thanks for the KIP!
>>>>>>>>>
>>>>>>>>> Overall, this makes sense and can simplify testing.
>>>>>>>>>
>>>>>>>>> What I am wondering is, why you suggest to return an `Iterable`?
>>>> Maybe
>>>>>>>>> returning an `Iterator` would make more sense? Or a List? Note
>> that
>>>>>> the
>>>>>>>>> order of emits matters, thus returning a generic `Collection`
>> would
>>>>>> not
>>>>>>>>> seem to be appropriate.
>>>>>>>>>
>>>>>>>>> Can you elaborate on the advantages to use `Iterable` compared to
>>> the
>>>>>>>>> other options?
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> -Matthias
>>>>>>>>>
>>>>>>>>> On 4/11/19 2:09 AM, Patrik Kleindl wrote:
>>>>>>>>>> Hi everyone,
>>>>>>>>>>
>>>>>>>>>> I would like to start the discussion on this small enhancement
>> of
>>>>>>>>>> the TopologyTestDriver.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>>
>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-451%3A+Make+TopologyTestDriver+output+iterable
>>>>>>>>>>
>>>>>>>>>> Pull request is available at
>>>>>>> https://github.com/apache/kafka/pull/6556
>>>>>>>>>>
>>>>>>>>>> Any feedback is welcome
>>>>>>>>>>
>>>>>>>>>> best regards
>>>>>>>>>>
>>>>>>>>>> Patrik
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>

signature.asc
Description: OpenPGP digital signature

Re: [DISCUSS] KIP-451: Make TopologyTestDriver output iterable

Reply via email to