Re: [DISCUSS] KIP-326: Schedulable KTable as Graph source

Ted Yu Tue, 26 Jun 2018 13:23:42 -0700

What's the relationship between this KIP and KIP-323 ?

Thanks


On Tue, Jun 26, 2018 at 11:22 AM, Flávio Stutz <[email protected]>
wrote:

> Hey, guys, I've just created a new KIP about creating a new DSL graph
> source for realtime partitioned consolidations.
>
> We have faced the following scenario/problem in a lot of situations with
> KStreams:
>    - Huge incoming data being processed by numerous application instances
>    - Need to aggregate different fields whose records span all topic
> partitions (something like “total amount spent by people aged > 30 yrs”
> when processing a topic partitioned by userid).
>
> The challenge here is to manage this kind of situation without any
> bottlenecks. We don't need the “global aggregation” to be processed at each
> incoming message. On a scenario of 500 instances, each handling 1k
> messages/s, any single point of aggregation (single partitioned topics,
> global tables or external databases) would create a bottleneck of 500k
> messages/s for single threaded/CPU elements.
>
> For this scenario, it is possible to store the partial aggregations on
> local stores and, from time to time, query those states and aggregate them
> as a single value, avoiding bottlenecks. This is a way to create a "timed
> aggregation barrier”.
>
> If we leverage this kind of built-in feature we could greatly enhance the
> ability of KStreams to better handle the CAP Theorem characteristics, so
> that one could choose to have Consistency over Availability when needed.
>
> We started this discussion with Matthias J. Sax here:
> https://issues.apache.org/jira/browse/KAFKA-6953
>
> If you want to see more, go to KIP-326 at:
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> 326%3A+Schedulable+KTable+as+Graph+source
>
> -Flávio Stutz
>

Re: [DISCUSS] KIP-326: Schedulable KTable as Graph source

Reply via email to