Re: [DISCUSS] KIP-598: Augment TopologyDescription with store and source / sink serde information

Bruno Cadonna Mon, 11 May 2020 08:57:09 -0700

Hi Matthias,

Consuming a structured `TopologyDescription` object is not always the
easiest option. If you want to use a topology description outside
Kafka Streams, for instance to visualize it like with
https://zz85.github.io/kafka-streams-viz/, a future-proof structured
serialization format comes in handy. Another interesting application
for the topology description would be to compare topologies as
proposed in https://issues.apache.org/jira/browse/KAFKA-8307.


Best,
Bruno

On Fri, May 8, 2020 at 11:15 PM Matthias J. Sax <mj...@apache.org> wrote:
>
> Thanks for the reply Guozhang.
>
>
> About the change to `toString()`: I don't think we need to have any
> concerns about compatibility, because nobody should rely on the string
> representation of an object. Adding `toJson()` is fine with me even if I
> don't see much value: we expose a structured `TopologyDescription`
> already and I would leave it to the user to translate it into a
> different format if they need to.
>
>
>
> About KAFKA-9913:
>
> Some newly added methods are not marked as `<---- NEW FUNC` in the KIP
> so I missed them. (In general, it might be best to omit existing methods
> and only include newly added ones. Makes the KIP easier to read.)
>
> Traversing the `TopologyDescription` as your code example shows would be
> an improvement to the current API. However, still seems to be "clumsy"
> and more code to write for users.
>
> I would still argue that returning a list of all topics names to lift
> the burden from users to write those nested loops would be helpful. For
> pattern subscription and sink-topic-extractors we have multiple design
> choices:
>
>  - simply exclude them
>  - include patterns as their pattern-string
>  - include sink-topic-extractors with a predefined format like,
> "Dynamic: <used-class-name"
>
> We could alse be more flexible and allow users to includes/exclude
> patter-sources/dynamic-sink or add methods that only return pattern
> sources or dynamic sinks.
>
> Thoughts?
>
>
>
> About Serdes key/value vs list: if we really want to be generic, I like
> Bruno's idea to use a map. We should not only apply this to stores, but
> also for source and sink topics (for consistency) though.
>
>
>
> About `build()` vs `build(Props)` -- both return a `Topology` and thus
> `TopologyDescription` seems to be independent of which one was called.
> Of course, if we use `build(Props)` we could add more serde information
> into the `Topology` but this seems to be orthogonal to this KIP? Just
> returning a `null` string for unknown serdes might be simplest (and we
> document that `null` means, fall back to whatever `StreamsConfig`
> specifies).
>
> (Btw: as proposed in KIP-591, we might deprecate `build()` in favor of
> `build(Props)` anyway.)
>
>
>
> About wrapping serdes: it would be good to know in advance if/how we can
> get the information about inner serdes. I am not sure atm if we would
> need more API changes to get this info. The other (minor) question is
> also, how this information would be presented to the use (as we only use
> `String` types for Serde information.
>
>
>
> -Matthias
>
>
> On 5/4/20 11:27 AM, Bruno Cadonna wrote:
> > Hi Guozhang,
> >
> > Thank you for the KIP!
> >
> > Exposing also the inner types of the wrapper serdes would be
> > important. For debugging as Matthias has already mentioned and to see
> > more easily changes that are applied to a topology.
> >
> > I am also +1 on the `toJson()` method to easily access the topology
> > description programmatically and to make the description backward
> > compatible.
> >
> > Regarding `List<String> serdeNames();`, I would be in favour of a more
> > expressive return type, like a Map that assigns labels to Serde names.
> > For example, for key and value serdes the label could be "key" and
> > "value". Or something similar.
> >
> > Best,
> > Bruno
> >
> >
> >
> > On Mon, May 4, 2020 at 2:25 AM Guozhang Wang <wangg...@gmail.com> wrote:
> >>
> >> Hello Matthias John, thanks for your comments!! Replied them inline.
> >>
> >> I think there are a couple open questions that I'd like to hear your
> >> opinions on with the context:
> >>
> >> a. For stores's serdes, the reason I proposed to expose a set of serde
> >> names instead of a pair of key / value serdes is for future possible store
> >> types which may not be key-values. I admit it could just be over-killing
> >> here so if you have a strong preference on the latter, I could be convinced
> >> to change that part but I'd want to make the original motivation clear.
> >>
> >> b. I think I'm convinced that I'd just augment the `toString` result
> >> regardless of which func generated the Topology (and hence its
> >> TopologyDescription), note this would mean that we break the compatibility
> >> of the current `toString` function. As a remedy for that, we will also
> >> expose a `toJson` function to programmatical purposes.
> >>
> >> Guozhang
> >>
> >>
> >>> (1) In the new TopologyDescription output, the line for the
> >>> windowed-count processor is:
> >>>
> >>>>  Processor: myname (stores: [(myname-store, serdes:
> >> [SessionWindowedSerde, FullChangeSerde])])
> >>>
> >>> For this case, both Serdes are wrappers and user would actually only
> >>> specified wrapped Serdes for the key and value. Can we do anything about
> >>> this? Otherwise, there might still be a runtime `ClassCastException`
> >>> that a user cannot easily debug.
> >>>
> >>>
> >>> (2) Nit: The JavaDocs of `Processor#storeSet()` seems to be incorrect
> >>> (it says "The names of all connected stores." -- guess it's c&p error)?
> >>>
> >> Yes! Fixed.
> >>
> >>>
> >>> (3) The KIP mentioned to add `Store#changelogTopic()` method, but the
> >>> output of `TopologyDescription#toString()` does not contain it. I think
> >>> it might be good do add it, too?
> >>>
> >> Yes, that's right. I'm going to add to the example as well.
> >>
> >>>
> >>> (4) The KIP also list https://issues.apache.org/jira/browse/KAFKA-9913
> >>> but it seems not to address it yet?
> >>>
> >> I actually did intent to have it addressed; the proposal includes:
> >>
> >> a. Return the set of source / sink nodes of a sub-topology, and their
> >> corresponding source / sink topics could be accessed.
> >> b. Return the set of stores of a sub-topology, and their corresponding
> >> changelog topics could be accessed.
> >>
> >> The reason I did not choose to just expose the set of all topics directly,
> >> but indirectly, is stated in the wiki:
> >>
> >> "the reason we did not expose APIs for topic names directly is that for
> >> source nodes, it is possible to have Pattern and for sink nodes, it is
> >> possible to have topic-extractors, and hence it's better to let users
> >> leveraging on the lower-level APIs to construct the topic names
> >> programmatically themselves."
> >>
> >>>
> >>> (5) As John, I also noticed that `List<String> Store#sedersNames()` is
> >>> not a great API. I am not sure if I understand your reply thought.
> >>> AFAIK, there is no exiting API
> >>>
> >>>> List<Serde> StoreBuilder#serdes()
> >>>
> >>> (cf
> >>>
> >> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/StoreBuilder.java
> >> )
> >>>
> >> Ah yes, that's added as part of this KIP.
> >>
> >>>
> >>> (6) Atm, we return `String` type for the Serdes. Do we think it's
> >>> sufficient? Just want to double check.
> >>
> >> The reason is that we can only get the serde-name at the time of the
> >> topology since it may be from config and hence there's no serde object
> >> actually.
> >>
> >>> (10) Can we avoid coupling this KIP’s behavior to the choice of ‘build’
> >> method? I.e., can we return the improved description even when people just
> >> call ‘build()’?
> >>
> >> Yes, as I replied in the above comment to yours, I've changed my mind to
> >> just return the augmented description no matter of the function; and will
> >> expose toJson() for future compatibilities. I've not yet updated the wiki
> >> yet.
> >>
> >>> Clearly, we need a placeholder if no serde is specified. How about
> >> “unknown”, or the name of the config keys,
> >> “default.key.serde”/“default.value.serde”?
> >>
> >> I think if `build(props)` is used, we can use the name of the configured
> >> values; otherwise since we do not know the config yet we'd have to use
> >> "unknown".
> >>
> >>> I still have some deep reservation about the ‘build(Parameters)’ method
> >> itself. I don’t really want to side-track this conversation with all my
> >> concerns if we can avoid it, though. It seems like justification enough
> >> that introducing dramatically different behavior based in on seemingly
> >> minor differences in api calls will be a source of mystery and complexity
> >> for users.
> >>
> >>> I.e., I’m characterizing a completely different string format as
> >> “dramatically different”, as opposed to just having a placeholder string.
> >>
> >>> (11) Regarding the wrapper serdes, I bet we can capture and print the
> >> inner types as well.
> >>
> >> Ack, I can do that.
> >>
> >> On Sat, May 2, 2020 at 8:19 AM John Roesler <vvcep...@apache.org> wrote:
> >>
> >>> Hi all,
> >>>
> >>> I’ve been sitting on another concern about this proposal. Since Matthias
> >>> has just submitted a few questions, perhaps I can pile on two more this
> >>> round.
> >>>
> >>> (10) Can we avoid coupling this KIP’s behavior to the choice of ‘build’
> >>> method? I.e., can we return the improved description even when people just
> >>> call ‘build()’?
> >>>
> >>> Clearly, we need a placeholder if no serde is specified. How about
> >>> “unknown”, or the name of the config keys,
> >>> “default.key.serde”/“default.value.serde”?
> >>>
> >>> I still have some deep reservation about the ‘build(Parameters)’ method
> >>> itself. I don’t really want to side-track this conversation with all my
> >>> concerns if we can avoid it, though. It seems like justification enough
> >>> that introducing dramatically different behavior based in on seemingly
> >>> minor differences in api calls will be a source of mystery and complexity
> >>> for users.
> >>>
> >>> I.e., I’m characterizing a completely different string format as
> >>> “dramatically different”, as opposed to just having a placeholder string.
> >>>
> >>> (11) Regarding the wrapper serdes, I bet we can capture and print the
> >>> inner types as well.
> >>>
> >>> Thanks again for the KIP!
> >>> -John
> >>>
> >>>
> >>>
> >>> On Thu, Apr 30, 2020, at 19:10, Matthias J. Sax wrote:
> >>>> Guozhang,
> >>>>
> >>>> thanks for the KIP!
> >>>>
> >>>> Couple of comments/questions.
> >>>>
> >>>
> >>>>
> >>>>
> >>>>
> >>>> -Matthias
> >>>>
> >>>>
> >>>>
> >>>>
> >>>> On 4/25/20 1:24 PM, Guozhang Wang wrote:
> >>>>> Hi John,
> >>>>>
> >>>>> Thanks for the review! Replied inline.
> >>>>>
> >>>>> On Fri, Apr 24, 2020 at 8:09 PM John Roesler <vvcep...@apache.org>
> >>> wrote:
> >>>>>
> >>>>>> Hi Guozhang,
> >>>>>>
> >>>>>> Thanks for the KIP! I took a quick look, and I'm really happy to see
> >>> this
> >>>>>> underway.
> >>>>>>
> >>>>>> Some quick questions:
> >>>>>>
> >>>>>> 1.  Can you elaborate on the reason that stores just have a list of
> >>>>>> serdes, whereas
> >>>>>> other components have an explicit key/value serde?
> >>>>>>
> >>>>>
> >>>>> This is because of the existing API "List<Serde>
> >>> StoreBuilder#serdes()".
> >>>>> Although both of its implementations would return two serdes (one for
> >>> key
> >>>>> and one for value), the API is more general to return a list. And
> >>> hence the
> >>>>> TopologyDescription#Store which gets them directly from StoreBuilder is
> >>>>> exposing the same API.
> >>>>>
> >>>>> 1.5. A side-effect of this seems to be that the string-formatted serde
> >>>>>> description is
> >>>>>> different, depending on whether the serdes are listed on a store or a
> >>>>>> topic. Just an
> >>>>>> observation.
> >>>>>>
> >>>>>
> >>>>> Yes I agree. I think we can probably change the "List<Serde>
> >>>>> StoreBuilder#serdes()" signature as well (which would be a breaking
> >>> change
> >>>>> though, so we should do that via deprecation), but I'm a bit concerned
> >>>>> since it was designed for future store types which may not be of K-V
> >>> format
> >>>>> any more.
> >>>>>
> >>>>>
> >>>>>> 2. You mentioned the key compatibility concern in my mind. We do know
> >>> that
> >>>>>> such
> >>>>>> use cases exist. Namely, our own tests and
> >>>>>> https://zz85.github.io/kafka-streams-viz/
> >>>>>> I'm wondering if we can add a forward-compatible machine-readable
> >>> format
> >>>>>> to the
> >>>>>> KIP, so that even though we must break the parsers right now, maybe
> >>> we'll
> >>>>>> never
> >>>>>> have to break them again. For example, I'm thinking of a "toJson"
> >>> method
> >>>>>> on the
> >>>>>> TopologyDescription that formats the entire topology description as a
> >>> json
> >>>>>> string.
> >>>>>>
> >>>>>>
> >>>>> Yes, I also have concerns about that (as described in the compatibility
> >>>>> section). One proposal I have is that we ONLY augment the toString
> >>> result
> >>>>> if the TopologyDescription is from a Topology built from
> >>>>> `StreamsBuilder#build(Properties)`, which is only recently added and
> >>> hence
> >>>>> most old usage would not get the benefits of it. But after thinking
> >>> about
> >>>>> this a bit more, I'm now more inclined to just always augment the
> >>> string,
> >>>>> and also add a `toJson` method in addition to `toString`.
> >>>>>
> >>>>>
> >>>>>> Thanks again!
> >>>>>> -John
> >>>>>>
> >>>>>> On Fri, Apr 24, 2020, at 00:26, Guozhang Wang wrote:
> >>>>>>> Hello folks,
> >>>>>>>
> >>>>>>> I'd like to start the discussion for KIP-598:
> >>>>>>>
> >>>>>>>
> >>>>>>
> >>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148648762
> >>>>>>>
> >>>>>>> It proposes to augment the topology description's sub-classes with
> >>> store
> >>>>>>> and source / sink serde information. Let me know what you think,
> >>> thanks!
> >>>>>>>
> >>>>>>> --
> >>>>>>> -- Guozhang
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>
> >>>>
> >>>> Attachments:
> >>>> * signature.asc
> >>>
> >>
> >>
> >> --
> >> -- Guozhang
>

Re: [DISCUSS] KIP-598: Augment TopologyDescription with store and source / sink serde information

Reply via email to