Daniyar, thanks for the update to the KIP. It's in really good shape and well written.
About the default constructor question: All Serdes/Serializer/Deserializer classes need a default constructor to create them easily via reflections when specifies in a config. I understand that it is not super user friendly, but all existing code works this way. Hence, it seems best to stick with the established pattern. We have a similar issue with `TimeWindowedSerde` and `SessionWindowedSerde`, and I just recently did a PR to improve user experience that address the exact issue John raised. (cf https://github.com/apache/kafka/pull/7067) Note, that if a user would instantiate the Serde manually, the user would also need to call `configure()` to setup the inner serdes. Kafka Streams would not setup those automatically and one might most likely end-up with an NPE. Coming back the KIP, and the parameter names. `WindowedSerdes` are similar to `ListSerde` as they wrap another Serde. For `WindowedSerdes`, we use the following parameter names: - default.windowed.key.serde.inner - default.windowed.value.serde.inner It might be good to align the naming pattern. I would also suggest to use `type` instead of `impl`? default.key.list.serde.impl -> default.list.key.serde.type default.value.list.serde.impl -> default.list.value.serde.type default.key.list.serde.element -> default.list.key.serde.inner default.value.list.serde.element -> default.list.value.serde.inner -Matthias On 7/10/19 8:52 AM, Development wrote: > Hi John, > > Yes, I do agree. That totally makes sense. The only thing is that it goes > against what Matthias suggested earlier: > "I think that ... `ListSerde` should have an default constructor and it > should be possible to pass in the `Class listClass` information via a > configuration. Otherwise, KafkaStreams cannot use it as default serde.” > > What do you think about that? I hope I’m not confusing anything. > > Best, > Daniyar Yeralin > >> On Jul 9, 2019, at 5:56 PM, John Roesler <j...@confluent.io> wrote: >> >> Ah, my apologies, I must have just overlooked it. Thanks for the update, too. >> >> Just one more super-small question, do we need this variant: >> >>> New method public static <T> Serde<List<T>> ListSerde() in >>> org.apache.kafka.common.serialization.Serdes class (infers list >>> implementation and inner serde from config file) >> >> It seems like this situation implies my config file is already set up for >> the list serde, so passing this serde (e.g., in Produced) would have the >> same effect as not specifying it. >> >> I guess that it could be the case that you have the >> `default.key/value.serde` set to something else, like StringSerde, but you >> still have the `default.key/value.list.serde.impl/element` set. This seems >> like it would result in more confusion than convenience, so my gut instinct >> is maybe we shouldn't introduce the `ListSerde()` variant until people >> actually request it later on. >> >> Thus, we'd just stick with fully config-driven or fully source-code-driven, >> not half/half. >> >> What do you think? >> >> Thanks, >> -John >> >> >> On Tue, Jul 9, 2019 at 9:58 AM Development <d...@yeralin.net >> <mailto:d...@yeralin.net>> wrote: >>> >>> Hi John, >>> >>> I hope everyone had a great long weekend. >>> >>> Regarding Java interfaces, I may not understand you correctly, but I think >>> I already listed them: >>> >>> So for Produced, you would use it in the following fashion, for example: >>> Produced.keySerde(Serdes.ListSerde(ArrayList.class, Serdes.Integer())) >>> >>> I also updated the KIP, and added a section “Serialization Strategy” where >>> I describe our logic of conditional serialization based on the type of an >>> inner serde. >>> >>> Thank you! >>> >>> Best, >>> Daniyar Yeralin >>> >>> On Jun 26, 2019, at 11:44 AM, John Roesler <j...@confluent.io >>> <mailto:j...@confluent.io>> wrote: >>> >>> Thanks for the update, Daniyar! >>> >>> In addition to specifying the config interface, can you also specify >>> the Java interface? Namely, if I need to pass an instance of this >>> serde in to the DSL directly, as in Produced, Materialized, etc., what >>> constructor(s) would I have available? Likewise with the Serializer >>> and Deserailizer. I don't think you need to specify the implementation >>> logic, since we've already discussed it here. >>> >>> If you also want to specify the serialized format of the data records >>> in the KIP, it could be useful documentation, as well as letting us >>> verify the schema for forward/backward compatibility concerns, etc. >>> >>> Thanks, >>> John >>> >>> On Wed, Jun 26, 2019 at 10:33 AM Development <d...@yeralin.net >>> <mailto:d...@yeralin.net>> wrote: >>> >>> >>> Hey, >>> >>> Finally made updates to the KIP: >>> https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization >>> >>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466%3A+Add+support+for+List%3CT%3E+serialization+and+deserialization> >>> >>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization >>> >>> <https://cwiki.apache.org/confluence/display/KAFKA/KIP-466:+Add+support+for+List%3CT%3E+serialization+and+deserialization>> >>> Sorry for the delay :) >>> >>> Thank You! >>> >>> Best, >>> Daniyar Yeralin >>> >>> On Jun 22, 2019, at 12:49 AM, Matthias J. Sax <matth...@confluent.io >>> <mailto:matth...@confluent.io>> wrote: >>> >>> Yes, something like this. I did not think about good configuration >>> parameter names yet. I am also not sure if I understand all proposed >>> configs atm. But all configs should be listed and explained in the KIP >>> anyway, and we can discuss further after you have updated the KIP (I can >>> ask more detailed question if I have any). >>> >>> >>> -Matthias >>> >>> On 6/21/19 2:05 PM, Development wrote: >>> >>> Yes, you are right. ByteSerializer is not what I need to have in a list >>> of primitives. >>> >>> As for the default constructor and configurability, just want to make >>> sure. Is this what you have on your mind? >>> >>> Best, >>> Daniyar Yeralin >>> >>> >>> >>> On Jun 21, 2019, at 2:51 PM, Matthias J. Sax <matth...@confluent.io >>> <mailto:matth...@confluent.io> >>> <mailto:matth...@confluent.io <mailto:matth...@confluent.io>>> wrote: >>> >>> Thanks for the update! >>> >>> I think that `ListDeserializer`, `ListSerializer`, and `ListSerde` >>> should have an default constructor and it should be possible to pass in >>> the `Class listClass` information via a configuration. Otherwise, >>> KafkaStreams cannot use it as default serde. >>> >>> >>> For the primitive serializers: `BytesSerializer` is not primitive IMHO, >>> as is it for `byte[]` with variable length -- it's for arrays, not for >>> single `byte` (note, that `Bytes` is a Kafka class wrapping `byte[]`). >>> >>> >>> For tests, we can comment on the PR. No need to do this in the KIP >>> discussion. >>> >>> >>> Can you also update the KIP? >>> >>> >>> >>> -Matthias >>> >>> >>> >>> >>> >>> On 6/21/19 11:29 AM, Development wrote: >>> >>> I made and pushed necessary commits, so we could review the final >>> version under PR https://github.com/apache/kafka/pull/6592 >>> <https://github.com/apache/kafka/pull/6592> >>> >>> I also need some advice on writing tests for this new serde. So far I >>> only have two test cases (roundtrip and empty payload), I’m not sure >>> if it is enough. >>> >>> Thank y’all for your help in this KIP :) >>> >>> Best, >>> Daniyar Yeralin >>> >>> >>> On Jun 21, 2019, at 1:44 PM, John Roesler <j...@confluent.io >>> <mailto:j...@confluent.io> >>> <mailto:j...@confluent.io <mailto:j...@confluent.io>>> wrote: >>> >>> Hey Daniyar, >>> >>> Looks good to me! Thanks for considering it. >>> >>> Thanks, >>> -John >>> >>> On Fri, Jun 21, 2019 at 9:04 AM Development <d...@yeralin.net >>> <mailto:d...@yeralin.net> >>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>> >>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>> wrote: >>> Hey John and Matthias, >>> >>> Yes, now I see it all. I’m storing lots of redundant information. >>> Here is my final idea. Yes, now a user should pass a list type. I >>> realized that’s the type is not really needed in ListSerializer, but >>> only in ListDeserializer: >>> >>> >>> In ListSerializer we will start storing sizes only if serializer is >>> not a primitive serializer: >>> >>> >>> Then, in deserializer, we persist passed list type, so that during >>> deserialization we could create an instance of it with predefined >>> listSize for better performance. >>> We also try to locate a primitiveSize based on passed deserializer. >>> If it is not there, then primitiveSize will be null. Which means >>> that each entry’s size was encoded individually. >>> >>> >>> This looks much cleaner and more concise. >>> >>> What do you think? >>> >>> Best, >>> Daniyar Yeralin >>> >>> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matth...@confluent.io >>> <mailto:matth...@confluent.io> >>> <mailto:matth...@confluent.io <mailto:matth...@confluent.io>> >>> <mailto:matth...@confluent.io <mailto:matth...@confluent.io>>> wrote: >>> >>> For encoding the list-type: I see John's point about re-encoding the >>> list-type redundantly. However, I also don't like the idea that the >>> Deserializer returns a fixed type... >>> >>> Maybe it's best allow users to specify the target list type on >>> deserialization via config? >>> >>> Similar for the primitive types: I don't think we need to encode the >>> type size, but users could specify the type on the deserializer (via a >>> config again)? >>> >>> >>> About generics: nesting could be arbitrarily deep. Hence, I doubt >>> we can >>> support this and a cast will be necessary at some point in the user >>> code. >>> >>> >>> >>> -Matthias >>> >>> >>> >>> On 6/20/19 1:21 PM, John Roesler wrote: >>> >>> Hey Daniyar, >>> >>> Thanks for looking at it! >>> >>> Something like your screenshot is more along the lines of what I was >>> thinking. Sorry, but I didn't follow what you mean, how would that not >>> be "vanilla java"? >>> >>> Unfortunately the deserializer needs more information, though. For >>> example, what if the inner type is a Map<String,String>? The serde >>> could >>> only be used to produce a LinkedList<Map>, thus, we'd still need an >>> inner serde, like you have in the KIP (Serde<T> innerSerde). >>> >>> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde( >>> /**list type**/ LinkedList.class, >>> /**inner serde**/ new MyRecordSerde() >>> ) >>> >>> And in configuration, it's something like: >>> default.key.serde: org...ListSerde >>> default.key.list.serde.type: java.util.LinkedList >>> default.key.list.serde.inner: com.mycompany.MyRecordSerde >>> >>> >>> What do you think? >>> Thanks, >>> -John >>> >>> On Thu, Jun 20, 2019 at 2:46 PM Development <d...@yeralin.net >>> <mailto:d...@yeralin.net> >>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>> >>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>> >>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> <mailto:d...@yeralin.net >>> <mailto:d...@yeralin.net>>>> wrote: >>> >>> Hey John, >>> >>> I gave read about TypeReference. It could work for the list serde. >>> However, it is not directly >>> supported: >>> https://github.com/FasterXML/jackson-databind/issues/1490 >>> <https://github.com/FasterXML/jackson-databind/issues/1490> >>> <https://github.com/FasterXML/jackson-databind/issues/1490 >>> <https://github.com/FasterXML/jackson-databind/issues/1490>> >>> The only way is to pass an actual class object into the constructor, >>> something like: >>> >>> It could be an option, but not a pretty one. What do you think of my >>> approach to use vanilla java and canonical class name? (As described >>> previously) >>> >>> Best, >>> Daniyar Yeralin >>> >>> On Jun 20, 2019, at 2:45 PM, Development <d...@yeralin.net >>> <mailto:d...@yeralin.net> >>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>> >>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>> >>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> >>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>>> wrote: >>> >>> Hi John, >>> >>> Thank you for your input! Yes, my idea looks a little bit over >>> engineered :) >>> >>> I also wanted to see a feedback from Mathias as well since he gave >>> me an idea about storing fixed/variable size entries. >>> >>> Best, >>> Daniyar Yeralin >>> >>> On Jun 18, 2019, at 6:06 PM, John Roesler <j...@confluent.io >>> <mailto:j...@confluent.io> >>> <mailto:j...@confluent.io <mailto:j...@confluent.io>> >>> <mailto:j...@confluent.io <mailto:j...@confluent.io>> >>> <mailto:j...@confluent.io <mailto:j...@confluent.io> >>> <mailto:j...@confluent.io <mailto:j...@confluent.io>>>> wrote: >>> >>> Hi Daniyar, >>> >>> That's a very clever solution! >>> >>> One observation is that, now, this is what we might call a >>> polymorphic >>> serde. That is, you're detecting the actual concrete type and then >>> promising to produce the exact same concrete type on read. >>> There are >>> some inherent problems with this approach, which in general >>> require >>> some kind of schema registry (not necessarily Schema >>> Registry, just >>> any registry for schemas) to solve. >>> >>> Notice that every serialized record has quite a bit of duplicated >>> information: the concrete type as well as a byte to indicate >>> whether >>> the value type is a fixed size, and, if so, an integer to >>> indicate the >>> actual size. These constitute a schema, of sorts, because they >>> tell us >>> later how exactly to deserialize the data. Unfortunately, this >>> information is completely redundant. In all likelihood, the >>> information will be exactly the same for every record in the >>> topic. >>> This problem is essentially the core motivation for serializations >>> like Avro: to move the schema outside of the serialization >>> itself, so >>> that the records won't contain so much redundant information. >>> >>> In this light, I'm wondering if it makes sense to go back to >>> something >>> like what you had earlier in which you don't support perfectly >>> preserving the concrete type for _this_ serde, but instead just >>> support deserializing to _some_ List. Then, you could defer full, >>> perfect, type preservation to serdes that have an external >>> system in >>> which to register their type information. >>> >>> There does exist an alternative, if we really do want to >>> preserve the >>> concrete type (which does seem kind of nice). You can add a >>> configuration option specifically for the serde to configure >>> what the >>> list type will be, and maybe what the element type is, as well. >>> >>> As far as "related work" goes, you might be interested to take >>> a look >>> at how Jackson can be configured to deserialize into a specific, >>> arbitrarily nested, generically parameterized class structure. >>> Specifically, you might find >>> >>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html >>> >>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> >>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html >>> >>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html>> >>> interesting. >>> >>> Thanks, >>> -John >>> >>> On Mon, Jun 17, 2019 at 12:38 PM Development <d...@yeralin.net >>> <mailto:d...@yeralin.net> >>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>> >>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>> >>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net> >>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>>> wrote: >>> >>> >>> bump >>> >>> >>> >>> >>> >>> >>> >>> > >
signature.asc
Description: OpenPGP digital signature