[ 
https://issues.apache.org/jira/browse/KAFKA-7654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16694474#comment-16694474
 ] 

Bruno Bieth commented on KAFKA-7654:
------------------------------------

bq. It a Java limitation that we loose type safety for default Serdes

I think it's a design issue rather than a language issue. Even with reified 
generics, having a global configuration that's set once and carry a pair of 
Serde (for the whole application!) isn't going to work. `ProcessorContext` 
could be `ProcessorContext<K,V>` but then you'll be stuck getting the 
`Serde<K>` / `Serde<V>` from the global config.

bq. They are useful if for example, all you data is always in AVRO or JSON.

Ok, I see. This is based under the assumption that the user of the API will be 
serializing its types using an inherently non-type-safe library. Say Jackson, 
which will do a best effort runtime serialization job on any types (and fail, 
at runtime, otherwise). But if you're using a type-safe library like circe (as 
we do), then we'll suffer that assumption.
I guess there's another assumption, which is that the source and sinks formats 
are controlled by the streams. In our case we neither control the source nor 
the sink format, i.e we get our JSON from team A and we send it to team B each 
with their own formats. I thought that would be a common case?

bq. I also don't see, why it "leaks to the API" – cast are only required in 
internal classed. User facing public DSL API does not require any casts (or do 
I miss something?).

I leaks in the form of a non-intuitive API, one that, for instance, take both a 
Materialized and a Produced (`table`) because the Materialized needs to be 
overridden so that defaults (which aren't type-safe) aren't used.

bq. If a KStream knows its Serde or only knows its deserializer seems to be a 
runtime detail

Again, if you consider serialization to be only happening at runtime (using 
reflection) - which IMO is a bad thing. Look up circe, it's a great library, 
that is worth moving to scala ;)

bq. but nothing a user should see in the API from my point of view.

The user is already seeing this when you have to workaround the default Serde, 
as in `table`: you have 1) a default pair of Serde in the global config 2) 
serdes in the Materialized and 3) as a parameter in the `table` method. With my 
suggestion you only have one pair of Serde. From the user perspective it's a 
whole lot easier to reason about.

As you said it's a matter of trade-off, whether you want to support the "all 
your types are encoded/decoded by AVRO / Jackson at runtime" use case or have a 
type-safe serialization and a cleaner API. Personally I would go for the 
type-safe solution, even if that means that Jackson users end-up passing their 
`objectMapper` a couple more times. This will at least make them think about 
serialization, and avoid situations like "how come my Car format isn't right? 
oh there's a global serde which is made of a global objectMapper that is 
configured with a CustomCarDeserializer in some remote part of the application".

bq. it has the disadvantage that it introduces two classes that (from a 
business logic point of view) are the same thing

In this fluent style I don't think the end-user cares much about which builder 
classes are returned as long as the IDE suggests valid methods. The main point 
is discoverability.

> Relax requirements on serializing-only methods.
> -----------------------------------------------
>
>                 Key: KAFKA-7654
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7654
>             Project: Kafka
>          Issue Type: Improvement
>          Components: streams
>            Reporter: Bruno Bieth
>            Priority: Major
>
> Methods such as KStream#to shouldn't require a Produced as only the 
> serializing part is ever used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to