Pramod Immaneni created APEXCORE-348:
----------------------------------------
Summary: Load based stream partitioning
Key: APEXCORE-348
URL: https://issues.apache.org/jira/browse/APEXCORE-348
Project: Apache Apex Core
Issue Type: Improvement
Reporter: Pramod Immaneni
Assignee: Pramod Immaneni
There are scenarios where the downstream partitions of an upstream operator are
generally not performing uniformly resulting in an overall sub-optimal
performance dictated by the slowest partitions. The reasons could be data
related such as some partitions are receiving more data to process than the
others or could be environment related such as some partitions are running
slower than others because they are on heavily loaded nodes.
A solution based on currently available functionality in the engine would be to
write a StreamCodec implementation to distribute data among the partitions such
that each partition is receiving similar amount of data to process. We should
consider adding StreamCodecs like these to the library but these however do not
solve the problem when it is environment related.
For that a better and more comprehensive approach would be look at how data is
being consumed by the downstream partitions from the BufferServer and use that
information to make decisions on how to send future data. If some partitions
are behind others in consuming data then data can be directed to the other
partitions. One way to do this would be to relay this type of statistical and
positional information from BufferServer to the upstream publishers. The
publishers can use this information in ways such as making it available to
StreamCodecs to affect destination of future data.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)