[
https://issues.apache.org/jira/browse/SAMZA-334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062363#comment-14062363
]
Chris Riccomini commented on SAMZA-334:
---------------------------------------
There's tension between ease of use and power. I will say that this is a
feature that we actually have needed in a real-world scenario. Granted,
SAMZA-123 would have fixed the problem for us.
The custom partitioning strategy could let developers alleviate pressure on
certain containers by moving partitions explicitly off. In such a case, you'd
only need this feature if a single partition on a single container is still
requiring too many resources. A work around to single partition/single
container overload would be to have an upstream job that repartitions
appropriately to reduce pressure, though.
My thinking was more along the lines of, "Hey if we had this, we could support
an auto-scaling feature that manipulates the settings dynamically."
> Need for asymmetric container config
> ------------------------------------
>
> Key: SAMZA-334
> URL: https://issues.apache.org/jira/browse/SAMZA-334
> Project: Samza
> Issue Type: Improvement
> Components: container
> Affects Versions: 0.8.0
> Reporter: Chinmay Soman
>
> The current (and upcoming) partitioning scheme(s) suggest that there might be
> a skew in the amount of data ingested and computation performed across
> different containers for a given Samza job. This directly affects the amount
> of resources required by a container - which today are completely symmetric.
> Case A] Partitioning on Kafka partitions
> For instance, consider a partitioner job which reads data from different
> Kafka topics (having different partition layouts). In this case, its possible
> that a lot of topics have a smaller number of Kafka partitions. Consequently
> the containers processing these partitions would need more resources than
> those responsible for the higher numbered partitions.
> Case B] Partitioning based on Kafka topics
> Even in this case, its very easy for some containers to be doing more work
> than others - leading to a skew in resource requirements.
> Today, the container config is based on the requirements for the worst (doing
> the most work) container. Needless to say, this leads to resource wastage. A
> better approach needs to consider what is the true requirement per container
> (instead of per job).
--
This message was sent by Atlassian JIRA
(v6.2#6252)