I made and pushed necessary commits, so we could review the final version under PR https://github.com/apache/kafka/pull/6592
I also need some advice on writing tests for this new serde. So far I only have two test cases (roundtrip and empty payload), I’m not sure if it is enough. Thank y’all for your help in this KIP :) Best, Daniyar Yeralin > On Jun 21, 2019, at 1:44 PM, John Roesler <j...@confluent.io> wrote: > > Hey Daniyar, > > Looks good to me! Thanks for considering it. > > Thanks, > -John > > On Fri, Jun 21, 2019 at 9:04 AM Development <d...@yeralin.net > <mailto:d...@yeralin.net>> wrote: > Hey John and Matthias, > > Yes, now I see it all. I’m storing lots of redundant information. > Here is my final idea. Yes, now a user should pass a list type. I realized > that’s the type is not really needed in ListSerializer, but only in > ListDeserializer: > > > In ListSerializer we will start storing sizes only if serializer is not a > primitive serializer: > > > Then, in deserializer, we persist passed list type, so that during > deserialization we could create an instance of it with predefined listSize > for better performance. > We also try to locate a primitiveSize based on passed deserializer. If it is > not there, then primitiveSize will be null. Which means that each entry’s > size was encoded individually. > > > This looks much cleaner and more concise. > > What do you think? > > Best, > Daniyar Yeralin > >> On Jun 20, 2019, at 5:45 PM, Matthias J. Sax <matth...@confluent.io >> <mailto:matth...@confluent.io>> wrote: >> >> For encoding the list-type: I see John's point about re-encoding the >> list-type redundantly. However, I also don't like the idea that the >> Deserializer returns a fixed type... >> >> Maybe it's best allow users to specify the target list type on >> deserialization via config? >> >> Similar for the primitive types: I don't think we need to encode the >> type size, but users could specify the type on the deserializer (via a >> config again)? >> >> >> About generics: nesting could be arbitrarily deep. Hence, I doubt we can >> support this and a cast will be necessary at some point in the user code. >> >> >> >> -Matthias >> >> >> >> On 6/20/19 1:21 PM, John Roesler wrote: >>> Hey Daniyar, >>> >>> Thanks for looking at it! >>> >>> Something like your screenshot is more along the lines of what I was >>> thinking. Sorry, but I didn't follow what you mean, how would that not >>> be "vanilla java"? >>> >>> Unfortunately the deserializer needs more information, though. For >>> example, what if the inner type is a Map<String,String>? The serde could >>> only be used to produce a LinkedList<Map>, thus, we'd still need an >>> inner serde, like you have in the KIP (Serde<T> innerSerde). >>> >>> Something more like Serde<LinkedList<MyRecord>> = Serdes.listSerde( >>> /**list type**/ LinkedList.class, >>> /**inner serde**/ new MyRecordSerde() >>> ) >>> >>> And in configuration, it's something like: >>> default.key.serde: org...ListSerde >>> default.key.list.serde.type: java.util.LinkedList >>> default.key.list.serde.inner: com.mycompany.MyRecordSerde >>> >>> >>> What do you think? >>> Thanks, >>> -John >>> >>> On Thu, Jun 20, 2019 at 2:46 PM Development <d...@yeralin.net >>> <mailto:d...@yeralin.net> >>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>> wrote: >>> >>> Hey John, >>> >>> I gave read about TypeReference. It could work for the list serde. >>> However, it is not directly >>> supported: https://github.com/FasterXML/jackson-databind/issues/1490 >>> <https://github.com/FasterXML/jackson-databind/issues/1490> >>> The only way is to pass an actual class object into the constructor, >>> something like: >>> >>> It could be an option, but not a pretty one. What do you think of my >>> approach to use vanilla java and canonical class name? (As described >>> previously) >>> >>> Best, >>> Daniyar Yeralin >>> >>>> On Jun 20, 2019, at 2:45 PM, Development <d...@yeralin.net >>>> <mailto:d...@yeralin.net> >>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>> wrote: >>>> >>>> Hi John, >>>> >>>> Thank you for your input! Yes, my idea looks a little bit over >>>> engineered :) >>>> >>>> I also wanted to see a feedback from Mathias as well since he gave >>>> me an idea about storing fixed/variable size entries. >>>> >>>> Best, >>>> Daniyar Yeralin >>>> >>>>> On Jun 18, 2019, at 6:06 PM, John Roesler <j...@confluent.io >>>>> <mailto:j...@confluent.io> >>>>> <mailto:j...@confluent.io <mailto:j...@confluent.io>>> wrote: >>>>> >>>>> Hi Daniyar, >>>>> >>>>> That's a very clever solution! >>>>> >>>>> One observation is that, now, this is what we might call a >>>>> polymorphic >>>>> serde. That is, you're detecting the actual concrete type and then >>>>> promising to produce the exact same concrete type on read. There are >>>>> some inherent problems with this approach, which in general require >>>>> some kind of schema registry (not necessarily Schema Registry, just >>>>> any registry for schemas) to solve. >>>>> >>>>> Notice that every serialized record has quite a bit of duplicated >>>>> information: the concrete type as well as a byte to indicate whether >>>>> the value type is a fixed size, and, if so, an integer to >>>>> indicate the >>>>> actual size. These constitute a schema, of sorts, because they >>>>> tell us >>>>> later how exactly to deserialize the data. Unfortunately, this >>>>> information is completely redundant. In all likelihood, the >>>>> information will be exactly the same for every record in the topic. >>>>> This problem is essentially the core motivation for serializations >>>>> like Avro: to move the schema outside of the serialization itself, so >>>>> that the records won't contain so much redundant information. >>>>> >>>>> In this light, I'm wondering if it makes sense to go back to >>>>> something >>>>> like what you had earlier in which you don't support perfectly >>>>> preserving the concrete type for _this_ serde, but instead just >>>>> support deserializing to _some_ List. Then, you could defer full, >>>>> perfect, type preservation to serdes that have an external system in >>>>> which to register their type information. >>>>> >>>>> There does exist an alternative, if we really do want to preserve the >>>>> concrete type (which does seem kind of nice). You can add a >>>>> configuration option specifically for the serde to configure what the >>>>> list type will be, and maybe what the element type is, as well. >>>>> >>>>> As far as "related work" goes, you might be interested to take a look >>>>> at how Jackson can be configured to deserialize into a specific, >>>>> arbitrarily nested, generically parameterized class structure. >>>>> Specifically, you might find >>>>> >>>>> https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html >>>>> >>>>> <https://fasterxml.github.io/jackson-core/javadoc/2.0.0/com/fasterxml/jackson/core/type/TypeReference.html> >>>>> interesting. >>>>> >>>>> Thanks, >>>>> -John >>>>> >>>>> On Mon, Jun 17, 2019 at 12:38 PM Development <d...@yeralin.net >>>>> <mailto:d...@yeralin.net> >>>>> <mailto:d...@yeralin.net <mailto:d...@yeralin.net>>> wrote: >>>>>> >>>>>> bump >