Really digging Kafka Streams so far, nice work all. I'm interested in
being able to materialize one or more KTables in full before the rest
of the topology begins processing messages. This seems fundamentally
useful since it allows you to get your database tables replicated up
off the change stream topics from Connect before the stream processing
workload starts.

In Samza we have bootstrap streams and stream prioritization to help
facilitate this. What seems desirable for Kafka Streams is:

- Per-source prioritization (by defaulting to >0, setting the stream
priority to 0 effectively bootstraps it.)
- Per-source initial offset settings (earliest or latest, default to latest)

To solve the KTable materialization problem, you'd set priority to 0
for its source and the source offset setting to earliest.

Right now it appears the only control you have for re-processing is
AUTO_OFFSET_RESET_CONFIG, but I believe this is a global setting for
the consumers, and hence, the entire job. Beyond that, I don't see any
way to prioritize stream consumption at all, so your KTables will be
getting materialized while the general stream processing work is
running concurrently.

I wanted to see if this case is actually supported already and I'm
missing something, or if not, if these options make sense. If this
seems reasonable and it's not too complicated, I could possibly try to
get together a patch. If so, any tips on implementing this would be
helpful as well. Thanks!

-Greg

Reply via email to