[jira] [Commented] (SAMZA-334) Need for asymmetric container config

Sriram Subramanian (JIRA) Tue, 15 Jul 2014 10:50:43 -0700

    [ 
https://issues.apache.org/jira/browse/SAMZA-334?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14062398#comment-14062398
 ]


Sriram Subramanian commented on SAMZA-334:
------------------------------------------

I would really like to see an existing system that has successfully implemented 
auto scaling. If such a thing is possible, I think having slices defined at the 
job level and letting the framework do the right thing in distributing the 
slice amongst the containers would be useful. The operational overhead in 
managing all the configs and tweaking them based on observed behavior is huge. 
We definitely cannot eliminate it for a framework like Samza but we should 
really think before introducing more configs and having a really long 
documentation of these configs which the users would almost always fail to 
read. We should also need to think about how something like Samza SQL would be 
able to take advantage of these properties and dynamically decide on these 
values. 

> Need for asymmetric container config
> ------------------------------------
>
>                 Key: SAMZA-334
>                 URL: https://issues.apache.org/jira/browse/SAMZA-334
>             Project: Samza
>          Issue Type: Improvement
>          Components: container
>    Affects Versions: 0.8.0
>            Reporter: Chinmay Soman
>
> The current (and upcoming) partitioning scheme(s) suggest that there might be 
> a skew in the amount of data ingested and computation performed across 
> different containers for a given Samza job. This directly affects the amount 
> of resources required by a container - which today are completely symmetric.
> Case A] Partitioning on Kafka partitions
> For instance, consider a partitioner job which reads data from different 
> Kafka topics (having different partition layouts). In this case, its possible 
> that a lot of topics have a smaller number of Kafka partitions. Consequently 
> the containers processing these partitions would need more resources than 
> those responsible for the higher numbered partitions. 
> Case B] Partitioning based on Kafka topics
> Even in this case, its very easy for some containers to be doing more work 
> than others - leading to a skew in resource requirements.
> Today, the container config is based on the requirements for the worst (doing 
> the most work) container. Needless to say, this leads to resource wastage. A 
> better approach needs to consider what is the true requirement per container 
> (instead of per job).



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (SAMZA-334) Need for asymmetric container config

Reply via email to