Chinmay Soman created SAMZA-334:
-----------------------------------
Summary: Need for asymmetric container config ?
Key: SAMZA-334
URL: https://issues.apache.org/jira/browse/SAMZA-334
Project: Samza
Issue Type: Improvement
Components: container
Reporter: Chinmay Soman
The current (and upcoming) partitioning scheme(s) suggest that there might be a
skew in the amount of data ingested and computation performed across different
containers for a given Samza job. This directly affects the amount of resources
required by a container - which today are completely symmetric.
Case A] Partitioning on Kafka partitions
For instance, consider a partitioner job which reads data from different Kafka
topics (having different partition layouts). In this case, its possible that a
lot of topics have a smaller number of Kafka partitions. Consequently the
containers processing these partitions would need more resources than those
responsible for the higher numbered partitions.
Case B] Partitioning based on Kafka topics
Even in this case, its very easy for some containers to be doing more work than
others - leading to a skew in resource requirements.
Today, the container config is based on the requirements for the worst (doing
the most work) container. Needless to say, this leads to resource wastage. A
better approach needs to consider what is the true requirement per container
(instead of per job).
--
This message was sent by Atlassian JIRA
(v6.2#6252)