Sergey created FLINK-12294:
------------------------------

             Summary: kafka consumer, data locality
                 Key: FLINK-12294
                 URL: https://issues.apache.org/jira/browse/FLINK-12294
             Project: Flink
          Issue Type: Improvement
          Components: Connectors / Kafka, Runtime / Coordination
            Reporter: Sergey


Additional flag (with default false value) controlling whether topic partitions 
already grouped by the key. Exclude unnecessary shuffle/resorting operation 
when this parameter set to true. As an example, say we have client's payment 
transaction in a kafka topic. We grouping by clientId (transaction with the 
same clientId goes to one kafka topic partition) and the task is to find max 
transaction per client in sliding windows. In terms of map\reduce there is no 
needs to shuffle data between all topic consumers, may be it`s worth to do 
within each consumer to gain some speedup due to increasing number of executors 
within each partition data. With N messages (in partition) instead of N*ln(N) 
(current realization with shuffle/resorting) it will be just N operations. For 
windows with thousands events - the tenfold gain of execution speed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to