[ 
https://issues.apache.org/jira/browse/KAFKA-7777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16796519#comment-16796519
 ] 

Paul Whalen commented on KAFKA-7777:
------------------------------------

This is a very interesting idea that I am suddenly very excited about, and 
since my team has a somewhat related problem, I'll phrase it the way we've been 
thinking of it: we love that key value state stores can be backed up to topics, 
but in our streams application we want a much richer way of querying data than 
just by key.

In a sense, {{range()}} partly solves this problem because it allows for a 
different way of querying the store rather then just based on your exact key.  
But the real win would be a complete decoupling of local state store 
implementation and how it changelogs to kafka.  It wouldn't need to be just 
key-value with range like RocksDB, but has a fancier on-disk structure that 
could support efficient querying or indexing of the data many ways (I'm 
thinking SQLite).  It would definitely increase fail-over/restore time, but 
that would be an acceptable/necessary tradeoff - if you're going to layout the 
data in a totally different format for querying, obviously you have to pay the 
cost of that translation.

What I'm proposing (completely decoupling local state store implementation and 
how it changelogs to kafka) is more useful for Processor API users, but it 
could also provide an API useable at the DSL level to enable what this JIRA is 
asking for (merely decoupling serdes between local state store and changelog in 
kafka).

> Decouple topic serdes from materialized serdes
> ----------------------------------------------
>
>                 Key: KAFKA-7777
>                 URL: https://issues.apache.org/jira/browse/KAFKA-7777
>             Project: Kafka
>          Issue Type: Wish
>          Components: streams
>            Reporter: Maarten
>            Priority: Minor
>              Labels: needs-kip
>
> It would be valuable to us to have the the encoding format in a Kafka topicĀ 
> decoupled from the encoding format used to cache the data locally in a kafka 
> streams app. 
> We would like to use the `range()` function in the interactive queries API to 
> look up a series of results, but can't with our encoding scheme due to our 
> keys being variable length.
> We use protobuf, but based on what I've read Avro, Flatbuffers and Cap'n 
> proto have similar problems.
> Currently we use the following code to work around this problem:
> {code}
> builder
>     .stream("input-topic", Consumed.with(inputKeySerde, inputValueSerde))
>     .to("intermediate-topic", Produced.with(intermediateKeySerde, 
> intermediateValueSerde)); 
> t1 = builder
>     .table("intermediate-topic", Consumed.with(intermediateKeySerde, 
> intermediateValueSerde), t1Materialized);
> {code}
> With the encoding formats decoupled, the code above could be reduced to a 
> single step, not requiring an intermediate topic.
> Based on feedback on my [SO 
> question|https://stackoverflow.com/questions/53913571/is-there-a-way-to-separate-kafka-topic-serdes-from-materialized-serdes]
>  a change that introduces this would impact state restoration when using an 
> input topic for recovery.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to