Chinmay Soman created SAMZA-334:
-----------------------------------

             Summary: Need for asymmetric container config ?
                 Key: SAMZA-334
                 URL: https://issues.apache.org/jira/browse/SAMZA-334
             Project: Samza
          Issue Type: Improvement
          Components: container
            Reporter: Chinmay Soman


The current (and upcoming) partitioning scheme(s) suggest that there might be a 
skew in the amount of data ingested and computation performed across different 
containers for a given Samza job. This directly affects the amount of resources 
required by a container - which today are completely symmetric.

Case A] Partitioning on Kafka partitions
For instance, consider a partitioner job which reads data from different Kafka 
topics (having different partition layouts). In this case, its possible that a 
lot of topics have a smaller number of Kafka partitions. Consequently the 
containers processing these partitions would need more resources than those 
responsible for the higher numbered partitions. 

Case B] Partitioning based on Kafka topics
Even in this case, its very easy for some containers to be doing more work than 
others - leading to a skew in resource requirements.

Today, the container config is based on the requirements for the worst (doing 
the most work) container. Needless to say, this leads to resource wastage. A 
better approach needs to consider what is the true requirement per container 
(instead of per job).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to