[ 
https://issues.apache.org/jira/browse/KAFKA-10179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17143238#comment-17143238
 ] 

Sophie Blee-Goldman commented on KAFKA-10179:
---------------------------------------------

[~desai.p.rohan] I'm not sure I understand why it's a problem for the 
deserializer to modify the value slightly, by dropping fields to take your 
example. We would end up restoring the full bytes into the store, sure, but the 
plain bytes are never actually used right? We would always go through the 
deserializer when reading the value from the store and using it in an 
operation. So the "extra" fields would still get dropped.

Maybe if your values are bloated with a lot of useful information that you 
didn't want to store, this could blow up the disk usage. But I think there's a 
difference between a simple operation on data to extract only the relevant bits 
– eg dropping a field you don't care about – and fundamentally transforming the 
data to get it into a different form. The former seems reasonable to do during 
a deserialization, but the latter should be its own operation in the topology.

Of course, this just applies to modifying the values. If your deserializer 
modifies the key in any way, this would be a problem since lookups by key would 
fail after a restoration copies over the plain bytes. But I would argue that 
it's illegal to modify the key during de/serialization at all, not because of 
the restoration issue but because it can cause incorrect partitioning.

Anyways, I'm probably overlooking something obvious, but I'm struggling to see 
exactly where and how this breaks. That said I do agree we should clarify that 
`serialize(deserialize())` must be the identity for keys

> State Store Passes Wrong Changelog Topic to Serde for Optimized Source Tables
> -----------------------------------------------------------------------------
>
>                 Key: KAFKA-10179
>                 URL: https://issues.apache.org/jira/browse/KAFKA-10179
>             Project: Kafka
>          Issue Type: Bug
>          Components: streams
>    Affects Versions: 2.5.0
>            Reporter: Bruno Cadonna
>            Assignee: Bruno Cadonna
>            Priority: Major
>             Fix For: 2.7.0
>
>
> {{MeteredKeyValueStore}} passes the name of the changelog topic of the state 
> store to the state store serdes. Currently, it always passes {{<application 
> ID>-<store name>-changelog}} as the changelog topic name. However, for 
> optimized source tables the changelog topic is the source topic. 
> Most serdes do not use the topic name passed to them. However, if the serdes 
> actually use the topic name for (de)serialization, e.g., when Kafka Streams 
> is used with Confluent's Schema Registry, a 
> {{org.apache.kafka.common.errors.SerializationException}} is thrown.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to