What's the relationship between this KIP and KIP-323 ? Thanks
On Tue, Jun 26, 2018 at 11:22 AM, Flávio Stutz <flaviost...@gmail.com> wrote: > Hey, guys, I've just created a new KIP about creating a new DSL graph > source for realtime partitioned consolidations. > > We have faced the following scenario/problem in a lot of situations with > KStreams: > - Huge incoming data being processed by numerous application instances > - Need to aggregate different fields whose records span all topic > partitions (something like “total amount spent by people aged > 30 yrs” > when processing a topic partitioned by userid). > > The challenge here is to manage this kind of situation without any > bottlenecks. We don't need the “global aggregation” to be processed at each > incoming message. On a scenario of 500 instances, each handling 1k > messages/s, any single point of aggregation (single partitioned topics, > global tables or external databases) would create a bottleneck of 500k > messages/s for single threaded/CPU elements. > > For this scenario, it is possible to store the partial aggregations on > local stores and, from time to time, query those states and aggregate them > as a single value, avoiding bottlenecks. This is a way to create a "timed > aggregation barrier”. > > If we leverage this kind of built-in feature we could greatly enhance the > ability of KStreams to better handle the CAP Theorem characteristics, so > that one could choose to have Consistency over Availability when needed. > > We started this discussion with Matthias J. Sax here: > https://issues.apache.org/jira/browse/KAFKA-6953 > > If you want to see more, go to KIP-326 at: > https://cwiki.apache.org/confluence/display/KAFKA/KIP- > 326%3A+Schedulable+KTable+as+Graph+source > > -Flávio Stutz >