[ https://issues.apache.org/jira/browse/KAFKA-7777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16796519#comment-16796519 ]
Paul Whalen commented on KAFKA-7777: ------------------------------------ This is a very interesting idea that I am suddenly very excited about, and since my team has a somewhat related problem, I'll phrase it the way we've been thinking of it: we love that key value state stores can be backed up to topics, but in our streams application we want a much richer way of querying data than just by key. In a sense, {{range()}} partly solves this problem because it allows for a different way of querying the store rather then just based on your exact key. But the real win would be a complete decoupling of local state store implementation and how it changelogs to kafka. It wouldn't need to be just key-value with range like RocksDB, but has a fancier on-disk structure that could support efficient querying or indexing of the data many ways (I'm thinking SQLite). It would definitely increase fail-over/restore time, but that would be an acceptable/necessary tradeoff - if you're going to layout the data in a totally different format for querying, obviously you have to pay the cost of that translation. What I'm proposing (completely decoupling local state store implementation and how it changelogs to kafka) is more useful for Processor API users, but it could also provide an API useable at the DSL level to enable what this JIRA is asking for (merely decoupling serdes between local state store and changelog in kafka). > Decouple topic serdes from materialized serdes > ---------------------------------------------- > > Key: KAFKA-7777 > URL: https://issues.apache.org/jira/browse/KAFKA-7777 > Project: Kafka > Issue Type: Wish > Components: streams > Reporter: Maarten > Priority: Minor > Labels: needs-kip > > It would be valuable to us to have the the encoding format in a Kafka topicĀ > decoupled from the encoding format used to cache the data locally in a kafka > streams app. > We would like to use the `range()` function in the interactive queries API to > look up a series of results, but can't with our encoding scheme due to our > keys being variable length. > We use protobuf, but based on what I've read Avro, Flatbuffers and Cap'n > proto have similar problems. > Currently we use the following code to work around this problem: > {code} > builder > .stream("input-topic", Consumed.with(inputKeySerde, inputValueSerde)) > .to("intermediate-topic", Produced.with(intermediateKeySerde, > intermediateValueSerde)); > t1 = builder > .table("intermediate-topic", Consumed.with(intermediateKeySerde, > intermediateValueSerde), t1Materialized); > {code} > With the encoding formats decoupled, the code above could be reduced to a > single step, not requiring an intermediate topic. > Based on feedback on my [SO > question|https://stackoverflow.com/questions/53913571/is-there-a-way-to-separate-kafka-topic-serdes-from-materialized-serdes] > a change that introduces this would impact state restoration when using an > input topic for recovery. -- This message was sent by Atlassian JIRA (v7.6.3#76005)