Did reply on SO.

-Matthias

On 1/24/24 2:18 AM, warrior2...@gmail.com wrote:
Let's say there's a topic in which chunks of different files are all mixed up represented by a tuple |(FileId, Chunk)|.

Chunks of a same file also can be a little out of order.

The task is to aggregate all files and store them into some store.

The number of files is unbound.

In pseudo stream DSL that might look like

|topic('chunks') .groupByKey((fileId, chunk) -> fileId) .sortBy((fileId, chunk) -> chunk.offset) .aggregate((fileId, chunk) -> store.append(fileId, chunk)); |

I want to understand whether kafka streams can solve this efficiently. Since the number of files is unbound how would kafka manage intermediate topics for groupBy operation? How many partitions will it use etc? Can't find this details in the docs. Also let's say chunk has a flag that indicates EOF. How to indicate that specific group will no longer have any new data?


That’s a copy of my stack overflow question.
apple-touch-i...@2.png
What does kafka streams groupBy does internally? <https://stackoverflow.com/questions/77870807/what-does-kafka-streams-groupby-does-internally> stackoverflow.com <https://stackoverflow.com/questions/77870807/what-does-kafka-streams-groupby-does-internally>

<https://stackoverflow.com/questions/77870807/what-does-kafka-streams-groupby-does-internally>


—
Michael

Reply via email to