Hi Matthias, Consuming a structured `TopologyDescription` object is not always the easiest option. If you want to use a topology description outside Kafka Streams, for instance to visualize it like with https://zz85.github.io/kafka-streams-viz/, a future-proof structured serialization format comes in handy. Another interesting application for the topology description would be to compare topologies as proposed in https://issues.apache.org/jira/browse/KAFKA-8307.
Best, Bruno On Fri, May 8, 2020 at 11:15 PM Matthias J. Sax <mj...@apache.org> wrote: > > Thanks for the reply Guozhang. > > > About the change to `toString()`: I don't think we need to have any > concerns about compatibility, because nobody should rely on the string > representation of an object. Adding `toJson()` is fine with me even if I > don't see much value: we expose a structured `TopologyDescription` > already and I would leave it to the user to translate it into a > different format if they need to. > > > > About KAFKA-9913: > > Some newly added methods are not marked as `<---- NEW FUNC` in the KIP > so I missed them. (In general, it might be best to omit existing methods > and only include newly added ones. Makes the KIP easier to read.) > > Traversing the `TopologyDescription` as your code example shows would be > an improvement to the current API. However, still seems to be "clumsy" > and more code to write for users. > > I would still argue that returning a list of all topics names to lift > the burden from users to write those nested loops would be helpful. For > pattern subscription and sink-topic-extractors we have multiple design > choices: > > - simply exclude them > - include patterns as their pattern-string > - include sink-topic-extractors with a predefined format like, > "Dynamic: <used-class-name" > > We could alse be more flexible and allow users to includes/exclude > patter-sources/dynamic-sink or add methods that only return pattern > sources or dynamic sinks. > > Thoughts? > > > > About Serdes key/value vs list: if we really want to be generic, I like > Bruno's idea to use a map. We should not only apply this to stores, but > also for source and sink topics (for consistency) though. > > > > About `build()` vs `build(Props)` -- both return a `Topology` and thus > `TopologyDescription` seems to be independent of which one was called. > Of course, if we use `build(Props)` we could add more serde information > into the `Topology` but this seems to be orthogonal to this KIP? Just > returning a `null` string for unknown serdes might be simplest (and we > document that `null` means, fall back to whatever `StreamsConfig` > specifies). > > (Btw: as proposed in KIP-591, we might deprecate `build()` in favor of > `build(Props)` anyway.) > > > > About wrapping serdes: it would be good to know in advance if/how we can > get the information about inner serdes. I am not sure atm if we would > need more API changes to get this info. The other (minor) question is > also, how this information would be presented to the use (as we only use > `String` types for Serde information. > > > > -Matthias > > > On 5/4/20 11:27 AM, Bruno Cadonna wrote: > > Hi Guozhang, > > > > Thank you for the KIP! > > > > Exposing also the inner types of the wrapper serdes would be > > important. For debugging as Matthias has already mentioned and to see > > more easily changes that are applied to a topology. > > > > I am also +1 on the `toJson()` method to easily access the topology > > description programmatically and to make the description backward > > compatible. > > > > Regarding `List<String> serdeNames();`, I would be in favour of a more > > expressive return type, like a Map that assigns labels to Serde names. > > For example, for key and value serdes the label could be "key" and > > "value". Or something similar. > > > > Best, > > Bruno > > > > > > > > On Mon, May 4, 2020 at 2:25 AM Guozhang Wang <wangg...@gmail.com> wrote: > >> > >> Hello Matthias John, thanks for your comments!! Replied them inline. > >> > >> I think there are a couple open questions that I'd like to hear your > >> opinions on with the context: > >> > >> a. For stores's serdes, the reason I proposed to expose a set of serde > >> names instead of a pair of key / value serdes is for future possible store > >> types which may not be key-values. I admit it could just be over-killing > >> here so if you have a strong preference on the latter, I could be convinced > >> to change that part but I'd want to make the original motivation clear. > >> > >> b. I think I'm convinced that I'd just augment the `toString` result > >> regardless of which func generated the Topology (and hence its > >> TopologyDescription), note this would mean that we break the compatibility > >> of the current `toString` function. As a remedy for that, we will also > >> expose a `toJson` function to programmatical purposes. > >> > >> Guozhang > >> > >> > >>> (1) In the new TopologyDescription output, the line for the > >>> windowed-count processor is: > >>> > >>>> Processor: myname (stores: [(myname-store, serdes: > >> [SessionWindowedSerde, FullChangeSerde])]) > >>> > >>> For this case, both Serdes are wrappers and user would actually only > >>> specified wrapped Serdes for the key and value. Can we do anything about > >>> this? Otherwise, there might still be a runtime `ClassCastException` > >>> that a user cannot easily debug. > >>> > >>> > >>> (2) Nit: The JavaDocs of `Processor#storeSet()` seems to be incorrect > >>> (it says "The names of all connected stores." -- guess it's c&p error)? > >>> > >> Yes! Fixed. > >> > >>> > >>> (3) The KIP mentioned to add `Store#changelogTopic()` method, but the > >>> output of `TopologyDescription#toString()` does not contain it. I think > >>> it might be good do add it, too? > >>> > >> Yes, that's right. I'm going to add to the example as well. > >> > >>> > >>> (4) The KIP also list https://issues.apache.org/jira/browse/KAFKA-9913 > >>> but it seems not to address it yet? > >>> > >> I actually did intent to have it addressed; the proposal includes: > >> > >> a. Return the set of source / sink nodes of a sub-topology, and their > >> corresponding source / sink topics could be accessed. > >> b. Return the set of stores of a sub-topology, and their corresponding > >> changelog topics could be accessed. > >> > >> The reason I did not choose to just expose the set of all topics directly, > >> but indirectly, is stated in the wiki: > >> > >> "the reason we did not expose APIs for topic names directly is that for > >> source nodes, it is possible to have Pattern and for sink nodes, it is > >> possible to have topic-extractors, and hence it's better to let users > >> leveraging on the lower-level APIs to construct the topic names > >> programmatically themselves." > >> > >>> > >>> (5) As John, I also noticed that `List<String> Store#sedersNames()` is > >>> not a great API. I am not sure if I understand your reply thought. > >>> AFAIK, there is no exiting API > >>> > >>>> List<Serde> StoreBuilder#serdes() > >>> > >>> (cf > >>> > >> https://github.com/apache/kafka/blob/trunk/streams/src/main/java/org/apache/kafka/streams/state/StoreBuilder.java > >> ) > >>> > >> Ah yes, that's added as part of this KIP. > >> > >>> > >>> (6) Atm, we return `String` type for the Serdes. Do we think it's > >>> sufficient? Just want to double check. > >> > >> The reason is that we can only get the serde-name at the time of the > >> topology since it may be from config and hence there's no serde object > >> actually. > >> > >>> (10) Can we avoid coupling this KIP’s behavior to the choice of ‘build’ > >> method? I.e., can we return the improved description even when people just > >> call ‘build()’? > >> > >> Yes, as I replied in the above comment to yours, I've changed my mind to > >> just return the augmented description no matter of the function; and will > >> expose toJson() for future compatibilities. I've not yet updated the wiki > >> yet. > >> > >>> Clearly, we need a placeholder if no serde is specified. How about > >> “unknown”, or the name of the config keys, > >> “default.key.serde”/“default.value.serde”? > >> > >> I think if `build(props)` is used, we can use the name of the configured > >> values; otherwise since we do not know the config yet we'd have to use > >> "unknown". > >> > >>> I still have some deep reservation about the ‘build(Parameters)’ method > >> itself. I don’t really want to side-track this conversation with all my > >> concerns if we can avoid it, though. It seems like justification enough > >> that introducing dramatically different behavior based in on seemingly > >> minor differences in api calls will be a source of mystery and complexity > >> for users. > >> > >>> I.e., I’m characterizing a completely different string format as > >> “dramatically different”, as opposed to just having a placeholder string. > >> > >>> (11) Regarding the wrapper serdes, I bet we can capture and print the > >> inner types as well. > >> > >> Ack, I can do that. > >> > >> On Sat, May 2, 2020 at 8:19 AM John Roesler <vvcep...@apache.org> wrote: > >> > >>> Hi all, > >>> > >>> I’ve been sitting on another concern about this proposal. Since Matthias > >>> has just submitted a few questions, perhaps I can pile on two more this > >>> round. > >>> > >>> (10) Can we avoid coupling this KIP’s behavior to the choice of ‘build’ > >>> method? I.e., can we return the improved description even when people just > >>> call ‘build()’? > >>> > >>> Clearly, we need a placeholder if no serde is specified. How about > >>> “unknown”, or the name of the config keys, > >>> “default.key.serde”/“default.value.serde”? > >>> > >>> I still have some deep reservation about the ‘build(Parameters)’ method > >>> itself. I don’t really want to side-track this conversation with all my > >>> concerns if we can avoid it, though. It seems like justification enough > >>> that introducing dramatically different behavior based in on seemingly > >>> minor differences in api calls will be a source of mystery and complexity > >>> for users. > >>> > >>> I.e., I’m characterizing a completely different string format as > >>> “dramatically different”, as opposed to just having a placeholder string. > >>> > >>> (11) Regarding the wrapper serdes, I bet we can capture and print the > >>> inner types as well. > >>> > >>> Thanks again for the KIP! > >>> -John > >>> > >>> > >>> > >>> On Thu, Apr 30, 2020, at 19:10, Matthias J. Sax wrote: > >>>> Guozhang, > >>>> > >>>> thanks for the KIP! > >>>> > >>>> Couple of comments/questions. > >>>> > >>> > >>>> > >>>> > >>>> > >>>> -Matthias > >>>> > >>>> > >>>> > >>>> > >>>> On 4/25/20 1:24 PM, Guozhang Wang wrote: > >>>>> Hi John, > >>>>> > >>>>> Thanks for the review! Replied inline. > >>>>> > >>>>> On Fri, Apr 24, 2020 at 8:09 PM John Roesler <vvcep...@apache.org> > >>> wrote: > >>>>> > >>>>>> Hi Guozhang, > >>>>>> > >>>>>> Thanks for the KIP! I took a quick look, and I'm really happy to see > >>> this > >>>>>> underway. > >>>>>> > >>>>>> Some quick questions: > >>>>>> > >>>>>> 1. Can you elaborate on the reason that stores just have a list of > >>>>>> serdes, whereas > >>>>>> other components have an explicit key/value serde? > >>>>>> > >>>>> > >>>>> This is because of the existing API "List<Serde> > >>> StoreBuilder#serdes()". > >>>>> Although both of its implementations would return two serdes (one for > >>> key > >>>>> and one for value), the API is more general to return a list. And > >>> hence the > >>>>> TopologyDescription#Store which gets them directly from StoreBuilder is > >>>>> exposing the same API. > >>>>> > >>>>> 1.5. A side-effect of this seems to be that the string-formatted serde > >>>>>> description is > >>>>>> different, depending on whether the serdes are listed on a store or a > >>>>>> topic. Just an > >>>>>> observation. > >>>>>> > >>>>> > >>>>> Yes I agree. I think we can probably change the "List<Serde> > >>>>> StoreBuilder#serdes()" signature as well (which would be a breaking > >>> change > >>>>> though, so we should do that via deprecation), but I'm a bit concerned > >>>>> since it was designed for future store types which may not be of K-V > >>> format > >>>>> any more. > >>>>> > >>>>> > >>>>>> 2. You mentioned the key compatibility concern in my mind. We do know > >>> that > >>>>>> such > >>>>>> use cases exist. Namely, our own tests and > >>>>>> https://zz85.github.io/kafka-streams-viz/ > >>>>>> I'm wondering if we can add a forward-compatible machine-readable > >>> format > >>>>>> to the > >>>>>> KIP, so that even though we must break the parsers right now, maybe > >>> we'll > >>>>>> never > >>>>>> have to break them again. For example, I'm thinking of a "toJson" > >>> method > >>>>>> on the > >>>>>> TopologyDescription that formats the entire topology description as a > >>> json > >>>>>> string. > >>>>>> > >>>>>> > >>>>> Yes, I also have concerns about that (as described in the compatibility > >>>>> section). One proposal I have is that we ONLY augment the toString > >>> result > >>>>> if the TopologyDescription is from a Topology built from > >>>>> `StreamsBuilder#build(Properties)`, which is only recently added and > >>> hence > >>>>> most old usage would not get the benefits of it. But after thinking > >>> about > >>>>> this a bit more, I'm now more inclined to just always augment the > >>> string, > >>>>> and also add a `toJson` method in addition to `toString`. > >>>>> > >>>>> > >>>>>> Thanks again! > >>>>>> -John > >>>>>> > >>>>>> On Fri, Apr 24, 2020, at 00:26, Guozhang Wang wrote: > >>>>>>> Hello folks, > >>>>>>> > >>>>>>> I'd like to start the discussion for KIP-598: > >>>>>>> > >>>>>>> > >>>>>> > >>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=148648762 > >>>>>>> > >>>>>>> It proposes to augment the topology description's sub-classes with > >>> store > >>>>>>> and source / sink serde information. Let me know what you think, > >>> thanks! > >>>>>>> > >>>>>>> -- > >>>>>>> -- Guozhang > >>>>>>> > >>>>>> > >>>>> > >>>>> > >>>> > >>>> > >>>> Attachments: > >>>> * signature.asc > >>> > >> > >> > >> -- > >> -- Guozhang >