[CONF] Apache Samza > SEP-22: Container Placements in Samza

2019-11-10 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 1 new edit on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain edited this page 
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Here's what changed: 
 
 
 
 
 
 
 
 
 
 
  Status   Current state: [ UNDER DISCUSSION ]  Discussion thread:  JIRA: SAMZA-TBD Released:    Problem   Samza operates in a multi-tenant environment with cluster managers like Yarn and Mesos where a single host can run multiple Samza containers. Often due to soft limits configured for cluster managers like Yarn and no notion of dynamic workload balancing in Samza a host lands in a situation where it is underperforming and it is desired to move one or more containers from that host to other hosts. Today this is not possible without affecting other jobs on the hot host or restarting the affected job manually. In other use cases like resetting checkpoints of a single container or supporting canary or rolling bounces the ability to restart a single or a subset of containers without restarting the whole job is highly desirable.    This doc addresses the problem of restarting/moving a single container of a job without affecting other containers of the same job or other job running on the same host. X    MotivationAlleviating Hot Host Problems: Yarn as a resource manager in Samza is configured to operate on soft limits (a job sets vcore for a container or has defaulted, this is the minimum guarantee of resources to be provided by Yarn ) & most of the customers go by default and fail to right-size their Job. In addition, there is no notion of a dynamic cluster balancer in Samza at LinkedIn today. Although Yarn to some capacity acts as a Cluster Balancer if configured with hard limits. Due to this often a host lands in a situation when containers on it are underperforming (also referred to as a hot host) because it has CPU heavy containers running on it while some other hosts are underutilized. Now it is desirable to move this container to a different host. To achieve the same following solutions exists:  
 
 Rewrite the locality mapping in coordinator stream and restart the job   
 Take the hot host out of rotation which kills containers from all the jobs running on that host. Then trigger a restart on a job whose container is supposed to be moved so Yarn would try to allocate some other host for it. Once the container starts on other hosts, then put the hot host in rotation again so that other container who were killed as a result of taking host out can be attempted to restart on the hot host again. It ain't easy!  
  If the ability to move a container exists, someone at the simplest can manually move containers to different hosts or write some simple scripts to automate that.   Canary / Rolling Bounces: When there is a bug in the Samza framework code that affects the Samza container deployment, Samza engineer needs to manually restart the container process on the given machine with the given binary version. This involves multiple steps (e.g. manually identify and log in the container host before using kill -9 to stop the process) which is inconvenient.With the restart ability, the same system can be used for building support for Canary or Rolling Bounces for YARN based Samza deployments, restart ability can be easily extended to deploy a single or subset of containers using a different version of application code. Resetting Checkpoints: Startpoint API has made resetting checkpoints easy but it still needs a dev to restart his job once he has set his start points. The restart can be potentially be prevented with restart ability of a single container or a few containersDraining a host: Moving all running containers from a host sequentially to other hosts or in parallel.Fix a Job in Degraded State: Often users find it desirable to just restart a single container for various reasons like underperforming containers when only a few containers running into exceptions because some partitions have corrupt messages. Today Samza kills the job if a container has run into exceptions more than a fixed configured number of times. In these scenarios, it's desirable for Users to keep the job running in a degraded state and only fix one or few containers of the job and issue a restart for them. Dynamic Workload Balancer: The ability to move and restart containers is the fundamental building block to developing a load balancer (like Cruise Control) for Samza. At the very simple this load balancer can be a simple script trying to balance cluster, later it can be built into a more sophisticated system.    Heterogeneous container: Each Samza job today has homogenous containers (in regards to memory & vcore configurations). One of the desired use cases for Samza in the future is the ability to restart a container with different sizes (memory & cpu) which can be used by ay

[CONF] Apache Samza > SEP-22: Container Placements in Samza

2019-12-03 Thread Ke Wu (Confluence)
Title: Message Title



 
 
 
There's 1 new inline comment on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Ke Wu  
 
 
  
 
 

Samza Metastore 
 
 

 
 
 
 
 
 
 
 
 is this available now?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View comment 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 6.15.8  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-06 Thread Prateek Maheshwari (Confluence)
Title: Message Title



 
 
 
There's 3 new inline comments on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

StandBy Container: 
 
 

 
 
 
 
 
 
 
 
 How is this different than 2b above (stateful + standby)?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

populated by the client 
 
 

 
 
 
 
 
 
 
 
 How would the client know whether the uuid is unique?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

destination-host 
 
 

 
 
 
 
 
 
 
 
 Required or optional? If optional, what's the behavior if missing?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-06 Thread Prateek Maheshwari (Confluence)
Title: Message Title



 
 
 
There's 11 new inline comments on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

request-expiry-timeout 
 
 

 
 
 
 
 
 
 
 
 Why does this need to be specified / overridden by the client?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

CREATED 
 
 

 
 
 
 
 
 
 
 
 What's the difference b/w created and accepted?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

processorId 
 
 

 
 
 
 
 
 
 
 
 processorId or uuid? Incomplete statement.  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

user 
 
 

 
 
 
 
 
 
 
 
 application-version. Users don't come with versions   
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

UNAUTHORIZED 
 
 

 
 
 
 
 
 
 
 
 No Bad request? E.g., what if one of the ids is invalid?Why is this status needed btw? Is this request async as well?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

registerContainerPlacementAction 
 
 

 
 
 
 
 
 
 
 
 placeContainer  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

processor-id: 
 
 

 
 
 
 
 
 
 
 
 No appId / uuid?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

ACCEPTED 
 
 

 
 
 
 
 
 
 
 
 Same as above.  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

Control Plane as described plane above the job 
 
 

 
 
 
 
 
 
 
 
 Can you clarify this sentence? Not clear what this means.  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

handler registered to control plane 
 
 

 
 
 
 
 
 
 
 
 What does this mean?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

maintaining some in-memory state with Container Placement Service 
 
 

 
 
 
 
 
 
 
 
 Who's responsibility is it to dedup requests? CPH or CPS? If CPH, why is the state kept in CPS?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-06 Thread Prateek Maheshwari (Confluence)
Title: Message Title



 
 
 
There's 6 new inline comments on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

need to be deleted 
 
 

 
 
 
 
 
 
 
 
 How / where does this happen? E.g., do you have a default TTL + timestamp in the control message where CPH rejects + deletes them if they're too old?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

No need to build Authentication since AM runs on hosts which are blacklisted for anyone except the Samza Team 
 
 

 
 
 
 
 
 
 
 
 LI specific detail, not true in general.  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

individual namespaces 
 
 

 
 
 
 
 
 
 
 
 Are requests and responses separate namespaces? If so, why?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

processorId 
 
 

 
 
 
 
 
 
 
 
 Where is UUID and deployment ID? In the payload?  Can you document the message structure (key and value).  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

delete messages 
 
 

 
 
 
 
 
 
 
 
 Who does this?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

Open Sources Access: 
 
 

 
 
 
 
 
 
 
 
 Fix this section.  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-06 Thread Prateek Maheshwari (Confluence)
Title: Message Title



 
 
 
There's 3 new comments and 1 new inline comment on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
3 new comments 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

 
 
 
 
 
 
 
 
 

 ContainerPlacementRequestMessage 
 

 How is this different from the class above? 
  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Reply
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

 
 
 
 
 
 
 
 
 

 ContainerPlacementHandler 
 

 Why would tools use ContainerPlacementHandler to writeContainerPlacementRequestMessage? 
  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Reply
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

 
 
 
 
 
 
 
 
 

 ContainerManager 
 

 What does  
handleExpiredRequestForControlActionOrHostAffinityEnabled mean? 
  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Reply
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
1 new inline comment 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

References 
 
 

 
 
 
 
 
 
 
 
 Remove references to internal docs.  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-06 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 5 new inline comments on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

StandBy Container: 
 
 

 
 
 
 
 
 
 
 
 How is this different than 2b above (stateful + standby)?  
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 2b is placing active container on its selected standby This is moving a standby container to a new host  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

populated by the client 
 
 

 
 
 
 
 
 
 
 
 How would the client know whether the uuid is unique?  
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 Client side has to do some book keeping. Since this UUID is required to query the status of the control action as well?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

destination-host 
 
 

 
 
 
 
 
 
 
 
 Required or optional? If optional, what's the behavior if missing?  
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 its required, any param that does not have "[optional]" mentioned is required  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

CREATED 
 
 

 
 
 
 
 
 
 
 
 What's the difference b/w created and accepted?  
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 CREATED status is when a request is issued and is waiting to be processed by AM, ex: when a ContainerPlacementRequestMessage is written to the meta-store using a client side tool the status is Created  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

registerContainerPlacementAction 
 
 

 
 
 
 
 
 
 
 
 placeContainer  
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 sure  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-06 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 3 new inline comments on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

request-expiry-timeout 
 
 

 
 
 
 
 
 
 
 
 Why does this need to be specified / overridden by the client?  
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 Since most of the jobs do not set the request expiry timeout, it is set to a default of 5 secs, this flexibility gives ability to override that timeout if in case the requests to Cluster Manager are taking longer to return. Since we do not have any performance benchmarking numbers for default cluster manager like yarn for such timeouts, this might be desirable.   
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

user 
 
 

 
 
 
 
 
 
 
 
 application-version. Users don't come with versions   
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 sure  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

UNAUTHORIZED 
 
 

 
 
 
 
 
 
 
 
 No Bad request? E.g., what if one of the ids is invalid?Why is this status needed btw? Is this request async as well?  
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 good catch, we only need BAD_REQUEST & UNAUTHORIZED   
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-06 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 4 inline comment updates on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

processorId 
 
 

 
 
 
 
 
 
 
 
 processorId or uuid? Incomplete statement.  
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 
 
 
 
 
  Since this is an ASYNC API nothing is returned, the status of the request can be queried by processorId   
 
 
 
  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain resolved 
 
 
  
 
 

user 
 
 

 
 
 
 
 
 
 
 
 application-version. Users don't come with versions   
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

Control Plane as described plane above the job 
 
 

 
 
 
 
 
 
 
 
 Can you clarify this sentence? Not clear what this means.  
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 Changed it to:   Control Plane is a channel outside the job that allows taking control actions by multiple controllers like Samza Dashboard, Startpoints controller.    
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

handler registered to control plane 
 
 

 
 
 
 
 
 
 
 
 What does this mean?  
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 Changed this to:   "ContainerPlacementHandler is a stateless handler registered to control plane that dispatches placement actions to invoke Container Placement Service APIs"  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-06 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 3 new inline comments on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

maintaining some in-memory state with Container Placement Service 
 
 

 
 
 
 
 
 
 
 
 Who's responsibility is it to dedup requests? CPH or CPS? If CPH, why is the state kept in CPS?  
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 This is old, CPH is stateless this is the duty of Container Placement Service (CPS) removing it  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

need to be deleted 
 
 

 
 
 
 
 
 
 
 
 How / where does this happen? E.g., do you have a default TTL + timestamp in the control message where CPH rejects + deletes them if they're too old?  
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 Messages can be deleted across restarts i.e on start-up.  In addition the client side tool will have ability to delete messages by directly writing to metastore  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

No need to build Authentication since AM runs on hosts which are blacklisted for anyone except the Samza Team 
 
 

 
 
 
 
 
 
 
 
 LI specific detail, not true in general.  
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 Changing it here! thanks!  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-06 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 3 new comments and 3 new edits on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
3 new comments 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

 
 
 
 
 
 
 
 
 

 ContainerPlacementRequestMessage 
 

 How is this different from the class above? 
  
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 Its exactly same as ContainerPlacementMessage but in future this class can be easily evolved to include more params  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Reply
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

 
 
 
 
 
 
 
 
 

 ContainerPlacementHandler 
 

 Why would tools use ContainerPlacementHandler to writeContainerPlacementRequestMessage? 
  
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 Shall I abstract it out as another util?   
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Reply
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

 
 
 
 
 
 
 
 
 

 ContainerManager 
 

 What does  
handleExpiredRequestForControlActionOrHostAffinityEnabled mean? 
  
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 I have changed it in code to handleExpiredRequest for the sake of brevity, expired request only apply to host affinity enabled cases & in cases of control actions   
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Reply
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
3 new edits 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain edited this page 
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Here's what changed: 
 
 
 
 
 
 
 
 
 
 
  handleExpiredRequestForControlActionOrHostAffinityEnabled   Status   Current state: [ UNDER DISCUSSION ] ...  KV for ContainerPlacementRequestMessages & ContainerPlacementResponseMessage  
 
 
 
 
  Key   
  Value   
 
 
  processorId   
  uuid: unique identifier a request, populated by client   applicationId: unique identifier of the deployed app for which the action is taken   destination-host: valid hostname / “ANY_HOST” / “STANDBY”   request-expiry-timeout: [optional]: timeout for any resource request to cluster manager    
 
 
 
  Part 2. Container Placement Service  ... 
 
 
 
 Code Block 
 
 
 
 
 
 
 
 
language 
java 
 
 
title 
ContainerManager 
 
 
linenumbers 
true 
 
 
  
 
 
 
 
 public class ContainerManager {
/**
* Registers a container placement action to move the running container to destination host
*
* @param requestMessage request containing details of placement request
* @param containerAllocator to request physical resources
*/
public void registerContainerPlacementAction(ContainerPlacementRequestMessage requestMessage, ContainerAllocator containerAllocator) {...}

/**
* Handles the container start action for both active & standby containers. This method is invoked by the allocator thread
*
* @param request pending request for the preferred host
* @param preferredHost preferred host to start the container
* @param allocatedResource resource allocated from {@link ClusterResourceManager}
* @param resourceRequestState state of request in {@link ContainerAllocator}
* @param allocator to request resources from @{@link ClusterResourceManager}
*
* @return true if the container launch is complete, false if the container launch is in progress. 
*/
boolean handleContainerLaunch(SamzaResourceRequest request, String preferredHost, SamzaResource allocatedResource,
   ResourceRequestState resourceRequestState, ContainerAllocator allocator) {..}

/**
* Handle the container launch failure for active containers and standby (if enabled).
*
* @param processorId logical id of the container eg 1,2,3
* @param containerId last known id of the container deployed
* @param preferredHost host on which container is requested to be deployed
* @param containerAllocator allocator for requesting resources
*/
void handleContainerLaunchFail(String processorId, String containerId, String preferredHost,
   ContainerAllocator containerAllocator) {...}

/**
* Handles the state update on successful launch of a container
*
* @param processorId logical processor id of container 0,1,2
*/
void handleContainerLaunchSuccess(String processorId) {...}

/**
* Handles the action to be taken after the container has been stopped.
*
* @param processorId logical id of the container eg 1,2,3
* @param containerId last known id of the container deployed
* @param preferredHost host on which container was last deployed
* @param exitStatus exit code returned by the container
* @param preferredHostRetryDelay delay to be incurred

[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-06 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 1 resolved inline comment and 1 new edit on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
1 resolved inline comment 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain resolved 
 
 
  
 
 

Control Plane as described plane above the job 
 
 

 
 
 
 
 
 
 
 
 Can you clarify this sentence? Not clear what this means.  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
1 new edit 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain edited this page 
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Here's what changed: 
 
 
 
 
 
 
 
 
 
 
 ...  Part 1. Container Placement Handler  
 
 Control Plane  as described plane above is a channel outside the job that allows taking control actions by multiple controllers like Samza Dashboard,  Start points Startpoints controller.   
 ContainerPlacementHandler is a stateless handler registered to control plane that dispatches placement actions to invoke Container Placement Service APIs  
 ...  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Go to page history 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-06 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 6 inline comment updates and 1 new edit on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
6 inline comment updates 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain resolved 
 
 
  
 
 

UNAUTHORIZED 
 
 

 
 
 
 
 
 
 
 
 No Bad request? E.g., what if one of the ids is invalid?Why is this status needed btw? Is this request async as well?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain resolved 
 
 
  
 
 

processor-id: 
 
 

 
 
 
 
 
 
 
 
 No appId / uuid?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment 
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 good catch  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain resolved 
 
 
  
 
 

ACCEPTED 
 
 

 
 
 
 
 
 
 
 
 Same as above.  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment 
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 sure, this should be similar to 1  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain resolved 
 
 
  
 
 

References 
 
 

 
 
 
 
 
 
 
 
 Remove references to internal docs.  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
1 new edit 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain edited this page 
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Here's what changed: 
 
 
 
 
 
 
 
 
 
 
 ... 
 
 
 
 
 
 
 
 
  API   
  placeContainer   
 
 
  Description   
  Active Container: Stop container process on source-host and starts it for   
 
 Stateless Job on either  
 
 Destination-host (destination host can be source as well)  
 Any host (destination-host = ANY_HOST)  
 
 Stateful Job on either   
 
 Destination-host (if specified, destination host can be source as well)  
 Standby Container (destination-host = STANDBY)  
 Any host (destination-host = ANY_HOST)  
 
  StandBy Container: Stop container process on source-host and starts it on:  
 
 
 Destination-host (if specified & matches StandBy Constraints)  
 Any host (otherwise which matches StandBy Constraints)  
 
  
 
 
  Parameters   
  uuid: unique identifier of a request, populated by the client   applicationId: unique identifier of the deployed app for which the action is taken   processor-id: Samza resource id of container e.g 0, 1, 2    destination-host: valid hostname / “ANY_HOST” / “STANDBY”   request-expiry-timeout: [optional]: timeout for any resource request to the cluster manager    
 
 
  Status code   
  CREATED, BAD_REQUEST, ACCEPTED, IN_PROGRESS, SUCCEEDED, FAILED   
 
 
  Returns   
  Since this is an ASYNC API nothing is returned, the status of the request can be queried by processorId   
 
 
  Failure Scenarios   
  There are following cases under which a request to place container might fail:  
 
 When an active container stop fails, in this case, we mark the request failed  
 When requested resources cannot be obtained from the cluster manager, in this case, we mark the request failed  
 When stopped active container fails to start on destination host in that case we mark the request failed and attempt to start on the source host, failure to do so results in starting the same on ANY_HOST  
  
 
 
 
 ... 
 
 
 
 
 
 
 
 
  API   
  containerStatus   
 
 
  Description   
  Gives the status & info of the container placement request, for ex is it running, stopped what control commands are issued on it   
 
 
  Parameters   
  processor-id: Samza resource id of container e.g 0, 1, 2    applicationId: unique identifier of the deployed app for which the action is taken   uuid: unique identifier of a request   
 
 
  Status code   
  BAD_REQUEST, UNAUTHORIZED   
 
 
  Returns   
  Status of the Container placement action    
 
 
 
 ... 
 
 
 
 
 
 
 
 
  API   
  controlStandBy   
 
 
  Description   
  Starts or Stops a standBy container for the active container   
 
 
  Parameters   
  processor-id: Samza resource id of container e.g 0, 1, 2    applicationId: unique identifier of the deployed app for which the action is taken   uuid: unique identifier of a request   
 
 
  Status code   
  CREATED, BAD_REQUEST, ACCEPTED, UNAUTHORIZED, IN_PROGRESS, SUCCEEDED, FAILED   
 
 
 
  Architecture   For implementing a scalable container placement control system, the proposed solution is divided into two parts:  ...  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Go to page history 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 

[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-07 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 1 resolved inline comment and 1 new edit on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
1 resolved inline comment 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain resolved 
 
 
  
 
 

No need to build Authentication since AM runs on hosts which are blacklisted for anyone except the Samza Team 
 
 

 
 
 
 
 
 
 
 
 LI specific detail, not true in general.  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
1 new edit 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain edited this page 
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Here's what changed: 
 
 
 
 
 
 
 
 
 
 
  handleExpiredRequestForControlActionOrHostAffinityEnabled   Status   Current state: [ UNDER DISCUSSION ] ...  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Go to page history 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-14 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 1 new edit on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain edited this page 
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Here's what changed: 
 
 
 
 
 
 
 
 
 
 
 ... 
 
 
 
 
 
 
 
 
  Pros   
  Cons   
 
 
 
 
 Simple to extend the existing REST endpoint  
 Need to build authentication  
 If the AM dies all the outstanding requests are discarded (no additional handling needed)  
  
 
 
 Need to build Authorization layer around these rest endpoints   
 Loading the already Heavy loaded Job coordinator with another service might cause an increase in memory used  
 Need to build a service for discovery or rely on Yarn embedded Servlet  
  
 
 
 
  Implementation Details:  
 
 ContainerPlacementHandler is a stateless handler dispatching ContainerPlacementRequestMessages from Metastore to Container Placement Service & ContainerPlacementResponseMessages from Container Placement Service to metastore for external controls to query the status of an action. (PR).   
 Metastore used today by in Samza by default is Kafka (coordinator stream) which is used to store configs & container mappings & is log compacted  
 ContainerPlacementRequestMessage & ContainerPlacementResponseMessage are maintained  in individual namespaces using NamespaceAwareMetaStoreKey in namespace using NamespaceAwareMetaStore ("samza-place-container-v1")  
  Key-Value Format   Key for storing the ContainerPlacementRequestMessage & ContainerPlacementResponseMessage in Metastore is chosen to be   ...  KV for ContainerPlacementRequestMessages & ContainerPlacementResponseMessage  ...  Key  ...  Value  ...  processorId   uuid: unique identifier a request, populated by client   applicationId: unique identifier of the deployed app for which the action is taken   destination-host: valid hostname / “ANY_HOST” / “STANDBY”  ...  UUID + "." + messageType(ContainerPlacementResponseMessage or ContainerPlacementRequestMessage). Value will be payload container ContainerPlacementRequestMessage & ContainerPlacementResponseMessage. Messages are written and read to the Metastore through the MetadataStore abstraction.    ContainerPlacementResponseMessage:  
 
 
 
 
 
 
 
 
 
 
Key 
Value 
Field Description 
Field Type 
 
 
"UUID.subType" 
uuid 
Unique identifier of a response message 
Required 
 
 
   
  processorId    
Logical processor id 0,1,2 of the container 
Required 
 
 
   
deploymentId 
Unique identifier for a deployment 
Required 
 
 
   
subType 
Type of message here: ContainerPlacementResponseMessage 
Required 
 
 
   
destinationHost 
Destination host where the container is desired to be moved 
Required 
 
 
   
statusCode 
Status of the current action 
Required 
 
 
   
responseMessage 
Response message in conjunction to status 
Required 
 
 
   
timestamp 
The timestamp of the response message 
Required 
 
 
   
requestExpiry 
Eequest expiry which acts as a timeout for any resource request to cluster resource manager 
Optional 
 
 
 
  Sample KV  
 
 
 
 
 
 
 
 
Key 
Value 
 
 
  [1,"samza-place-container-v1","88b0d30c-d518-4307-9e8e-c8529eb30f04.ContainerPlacementResponseMessage"]   
  {"processorId":"1","deploymentId":"app-atttempt-001","subType":"ContainerPlacementResponseMessage","responseMessage":"Request is accepted","uuid":"88b0d30c-d518-4307-9e8e-c8529eb30f04","destinationHost":"ANY_HOST","statusCode":"ACCEPTED","timestamp":1578694070875}   
 
 
 
  GC policy for stale messages in metastore  
 
 One way to delete stale ContainerPlacementMessages is to delete request / responses from the previous incarnation of the job in the metastore on job restarts  
 Once the request is complete, ContainerPlacementService can issue an async delete to the metastore  
  Part 2. Container Placement Service   Container Placement service is a set of APIs built around AM to move/restart containers. The solution proposes to refactor & simplify the current AM code & introduce a ContainerManager which is a single entity managing container actions like start, stop for both active and standby containers. Enlisted are functions of ContainerManager & proposed refactoring around the AM code   ...  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Go to page history 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-14 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 3 resolved inline comments and 1 new edit on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
3 resolved inline comments 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain resolved 
 
 
  
 
 

individual namespaces 
 
 

 
 
 
 
 
 
 
 
 Are requests and responses separate namespaces? If so, why?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain resolved 
 
 
  
 
 

processorId 
 
 

 
 
 
 
 
 
 
 
 Where is UUID and deployment ID? In the payload?  Can you document the message structure (key and value).  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain resolved 
 
 
  
 
 

delete messages 
 
 

 
 
 
 
 
 
 
 
 Who does this?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
1 new edit 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain edited this page 
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Here's what changed: 
 
 
 
 
 
 
 
 
 
 
 ...  Key for storing the ContainerPlacementRequestMessage & ContainerPlacementResponseMessage in Metastore is chosen to be UUID + "." + messageType(ContainerPlacementResponseMessage or ContainerPlacementRequestMessage). Value will be payload container ContainerPlacementRequestMessage & ContainerPlacementResponseMessage. Messages are written and read to the Metastore through the MetadataStore abstraction.  Since the metastore is eventually consistent, duplicate messages are required to be handled by ContainerPlacementService.    ContainerPlacementRequestMessage:  
 
 
 
 
 
 
 
 
 
 
Key 
Value 
Field Description 
Field Type 
 
 
"UUID.subType" 
uuid 
Unique identifier of a response message 
Required 
 
 
   
  processorId    
Logical processor id 0,1,2 of the container 
Required 
 
 
   
deploymentId 
Unique identifier for a deployment 
Required 
 
 
   
subType 
Type of message here: ContainerPlacementResponseMessage 
Required 
 
 
   
destinationHost 
Destination host where the container is desired to be moved 
Required 
 
 
   
statusCode 
Status of the current action 
Required 
 
 
   
responseMessage 
Response message in conjunction to status 
Required 
 
 
   
timestamp 
The timestamp of the response message 
Required 
 
 
   
requestExpiry 
Eequest expiry which acts as a timeout for any resource request to cluster resource manager 
Optional 
 
 
 
  Sample KV  
 
 
 
 
 
 
 
 
Key 
Value 
 
 
  [1,"samza-place-container-v1","f068175b-c9b6-4f34-982b-ecb5619f21de.ContainerPlacementRequestMessage"]   
  {"processorId":"1","deploymentId":"app-atttempt-001","subType":"ContainerPlacementRequestMessage","uuid":"f068175b-c9b6-4f34-982b-ecb5619f21de","destinationHost":"ANY_HOST","statusCode":"CREATED","timestamp":1578693870484}   
 
 
 
 ContainerPlacementResponseMessage:  
 
 
 
 
 
 
 
 
 
 
Key 
Value 
Field Description 
Field Type 
 
 
"UUID.subType" 
uuid 
Unique identifier of a response message 
Required 
 
 
   
  processorId    
Logical processor id 0,1,2 of the container 
Required 
 
 
   
deploymentId 
Unique identifier for a deployment 
Required 
 
 
   
subType 
Type of message here: ContainerPlacementResponseMessage 
Required 
 
 
   
destinationHost 
Destination host where the container is desired to be moved 
Required 
 
 
   
statusCode 
Status of the current action 
Required 
 
 
   
responseMessage 
Response message in conjunction to status 
Required 
 
 
   
timestamp 
The timestamp of the response message 
Required 
 
 
   
requestExpiry 
Eequest expiry which acts as a timeout for any resource request to cluster resource manager 
Optional 
 
 
 
  Sample KV  
 
 
 
 
 
 
 
 
Key 
Value 
 
 
  [1,"samza-place-container-v1","88b0d30c-d518-4307-9e8e-c8529eb30f04.ContainerPlacementResponseMessage"]   
  {"processorId":"1","deploymentId":"app-atttempt-001","subType":"ContainerPlacementResponseMessage","responseMessage":"Request is accepted","uuid":"88b0d30c-d518-4307-9e8e-c8529eb30f04","destinationHost":"ANY_HOST","statusCode":"ACCEPTED","timestamp":1578694070875}   
 
 
 
  GC policy for stale messages in metastore  
 
 One way to delete stale ContainerPlacementMessages is to delete request / responses from the previous incarnation of the job in the metastore on job restarts  
 Once the request is complete, ContainerPlacementService can issue an async delete to the metastore  
 Request / response message can be externally cleaned by a tool  
  Part 2. Container Placement Service  ... 
 
 Remove the HostAwareContainerAllocator & ContainerAllocator, simplify Container Allocator as a simple lightweight entity allocating requests to available resources (PR1, PR2)  
 Introduce ContainerManager which acts as a brain for validating and issuing any actions on containers in the Job C

[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-14 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 1 new edit on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain edited this page 
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Here's what changed: 
 
 
 
 
 
 
 
 
 
 
 ...  ContainerPlacementRequestMessage:  
 
 
 
 
 
 
 
 
 
 
Key 
Value 
Field Description 
Field Type 
 
 
"UUID.subType" 
uuid 
Unique identifier of a response message 
Required 
 
 
   
  processorId    
Logical processor id 0,1,2 of the container 
Required 
 
 
   
deploymentId 
Unique identifier for a deployment 
Required 
 
 
   
subType 
Type of message here: ContainerPlacementResponseMessage 
Required 
 
 
   
destinationHost 
Destination host where the container is desired to be moved 
Required 
 
 
   
statusCode 
Status of the current action 
Required 
 
 
   
responseMessage 
Response message in conjunction to status 
Required 
 
 
   
timestamp 
The timestamp of the response message 
Required 
 
 
   
requestExpiry 
Eequest expiry which acts as a timeout for any resource request to cluster resource manager 
Optional 
 
 
 
 Sample KV 
 
 
 
 
 
 
 
 
Key 
Value 
 
 
  [1,"samza-place-container-v1","f068175b-c9b6-4f34-982b-ecb5619f21de.ContainerPlacementRequestMessage"]   
  {"processorId":"1","deploymentId":"app-atttempt-001","subType":"ContainerPlacementRequestMessage","uuid":"f068175b-c9b6-4f34-982b-ecb5619f21de","destinationHost":"ANY_HOST","statusCode":"CREATED","timestamp":1578693870484}   
 
 
 
 ContainerPlacementResponseMessage:  ... 
 
 One way to delete stale ContainerPlacementMessages is to delete request / responses from the previous incarnation of the job in the metastore on job restarts, this is the responsibility of ContainerPlacementService   
 Once the request is complete, ContainerPlacementService can issue an async delete to the metastore  
 Request / response message can be externally cleaned by a tool  
 ...  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Go to page history 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-14 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 2 new edits on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain edited this page 
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Here's what changed: 
 
 
 
 
 
 
 
 
 
 
 ... 
 
 Remove the HostAwareContainerAllocator & ContainerAllocator, simplify Container Allocator as a simple lightweight entity allocating requests to available resources (PR1, PR2)  
 Introduce ContainerManager which acts as a brain for validating and issuing any actions on containers in the Job Coordinator for both active & Standby containers. (PR)  
 
 Transfer state & validation of container launch & expired request handling from ContainerAllocator to ContainerManager  
 Transfer state & validation for callback handler lifecycle management of Container allocator & resource request on boot  from ClusterResourceManager.CallBack(ContainerProcessManager) to ContainerManager  
 
 Encapsulates logic and state related to container placement actions like move, restarts for active & standby container in ContainerManager (PR-1, TDB)  
 
 It is ContainerManager’s duty to validate any ContainerPlacementRequestMessages & also invalidate messages from the previous deployment incarnation  
 It is ContainerManager’s duty to write ContainerPlacementResponseMessages to Metastore for external control to query the status of the request  
 ContainerPlacementMetadata is a metadata holder for container actions (ControlActionMetadata) for ex request_id, current status, requested resources etc   
 
  Note: ClusterResourceManager.Callback (ContainerProcessManager) is tightly coupled with ClusterbasedJobCoordinator today, all the proposed changes will be done except for moving state & lifecycle management of Container allocator & resource request on boot from ClusterResourceManager.CallBack(ContainerProcessManager) to ContainerManager in phase 1 of the implementation so that this feature can be developed faster. Hence ContainerProcessManager will still be tied with ClusterBasedJobCoordinator and will intercept any container placement requests.    2.1 Container Move   2.1.1 Stateless Container Move & Stateful Container Move (without Standby)  ... 
 
 If the preferred resources are not able to be accrued the active container is never stopped and a failure notification is sent for the ContainerPacementRequest  
 If the ContainerPlacementManager is not able to stop the active container (3.1 #1 above fails) in that  case the request is marked failed & a failure notification is sent for the ContainerPacementRequest  
 If ClusterResourceManager fails to start the stopped active container on the accrued destination host, then we attempt to start the container back on the source host and a failure notification is sent for the ContainerPacementRequest. If container fails to start on source host then an attempt is made to start on ANY_HOST  
 Note: ClusterResourceManager.Callback (ContainerProcessManager) is tightly coupled with ClusterbasedJobCoordinator today, all the proposed changes will be done except for moving state & lifecycle management of Container allocator & resource request on boot from ClusterResourceManager.CallBack(ContainerProcessManager) to ContainerManager in phase 1 of the implementation so that this feature can be developed faster. Hence ContainerProcessManager will still be tied with ClusterBasedJobCoordinator and will intercept any container placement requests. Option 3: Stateful without Standby (Spin Up StandBy container & then move) (Phase 2) [Strech]  ...  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Go to page history 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-14 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 2 new edits on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain edited this page 
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Here's what changed: 
 
 
 
 
 
 
 
 
 
 
 ... 
 
 
 
 Code Block 
 
 
 
 
 
 
 
 
language 
java 
 
 
title 
ContainerPlacementMessage.java 
 
 
linenumbers 
true 
 
 
  
 
 
 
 
 /**
* Encapsulates the request or response payload information between the ContainerPlacementHandler service and external
* controllers issuing placement actions
*/
public abstract class ContainerPlacementMessage {

public enum StatusCode {
 /**
  * Indicates that the container placement action is created
  */
 CREATED,

 /**
  * Indicates that the container placement action was rejected because request was deemed invalid
  */
 BAD_REQUEST,

 /**
  * Indicates that the container placement action is accepted and waiting to be processed
  */
 ACCEPTED,

 /**
  * Indicates that the container placement action is in progress
  */
 IN_PROGRESS,

 /**
  * Indicates that the container placement action is in progress
  */
 SUCCEEDED,

 /**
  * Indicates that the container placement action is in failed
  */
 FAILED;
}

/**
* UUID attached to a message which helps in identifying duplicate request messages written to metastore and not
* retake actions even if metastore is eventually consistent
*/
protected final UUID uuid;
/**
* Unique identifier for a deployment so messages can be invalidated across a job restarts
* for ex yarn bases cluster manager should set this to app attempt id
*/
protected final String applicationId;
// Logical container Id 0, 1, 2
protected final String processorId;
// Destination host where container is desired to be moved
protected final String destinationHost;
// Optional request expiry which acts as a timeout for any resource request to cluster resource manager
protected final Duration requestExpiry;
// Status of the current request
protected final StatusCode statusCode;
// Timestamp of the request or response message
protected final long timestamp;

protected ContainerPlacementMessage(UUID uuid, String applicationId, String processorId, String destinationHost,
   Duration requestExpiry, StatusCode statusCode, long timestamp) {…}

}  
 
 

 
 
 
 Code Block 
 
 
 
 
 
 
 
 
language 
java 
 
 
title 
ContainerPlacementRequestMessage 
 
 
linenumbers 
true 
 
 
  
 
 
 
 
 /**
* Encapsulates the request sent from the external controller to the JobCoordinator to take a container placement action
*/
public class ContainerPlacementRequestMessage extends ContainerPlacementMessage {

public ContainerPlacementRequestMessage(UUID uuid, String applicationId, String processorId, String destinationHost, Duration requestExpiry, long timestamp) {...}

public ContainerPlacementRequestMessage(UUID uuid, String applicationId, String processorId, String destinationHost, long timestamp) {...}
}  
 
 

 
 
 
 Code Block 
 
 
 
 
 
 
 
 
language 
java 
 
 
title 
ContainerPlacementResponseMessage 
 
 
linenumbers 
true 
 
 
  
 
 
 
 
 /**
* Encapsulates the response sent from the JobCoordinator for a container placement action
*/
public class ContainerPlacementResponseMessage extends ContainerPlacementMessage {
 // Returned status of the request
 private String responseMessage;

 public ContainerPlacementResponseMessage(UUID uuid, String applicationId, String processorId, String destinationHost,
 Duration requestExpiry, StatusCode statusCode, String responseMessage, long timestamp) {.. .}

 public ContainerPlacementResponseMessage(UUID uuid, String applicationId, String processorId, String destinationHost,
 StatusCode statusCode, String responseMessage) {
   this(uuid, applicationId, processorId, destinationHost, null, statusCode, responseMessage);
 long timestamp) {...}

  
 
 

 
 
 
 Code Block 
 
 
 
 
 
 
 
 
language 
java 
 
 
title 
ContainerPlacementHandlerContainerPlacementMetadataStore 
 
 
linenumbers 
true 
 
 
  
 
 
 
 
 
public class ContainerPlacementHandler {

/**
 * Entity managing read writes to the metastore for {@link org.apache.samza.container.placement.ContainerPlacementRequestMessage}
 * and {@link org.apache.samza.container.placement.ContainerPlacementResponseMessage}
 */
public class ContainerPlacementMetadataStore {

/**
 * Writes a {@link ContainerPlacementRequestMessage} to the underlying metastore.
* This method should be used by external controllers to issue a request to JobCoordinator
*
* @param message container placement request
*/
public void writeContainerPlacementRequestMessage(ContainerPlacementRequestMessage message controllers
 * to issue a request to JobCoordinator
 *
 * @param deploymentId identifier of the deployment
 * @param processorId logical id of the samza container 0,1,2
 * @param destinationHost host where the container is desired to move
 * @param requestExpiry opti

[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-14 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 2 new edits on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain edited this page 
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Here's what changed: 
 
 
 
 
 
 
 
 
 
 
 ... 
 
 
 
 Code Block 
 
 
 
 
 
 
 
 
language 
java 
 
 
title 
ContainerPlacementRequestAllocator 
 
 
linenumbers 
true 
 
 
  
 
 
 
 
 
/**
 * Stateless handler that periodically dispatches {@link ContainerPlacementRequestMessage} read from Metadata store to Job Coordinator
 */
public class ContainerPlacementRequestAllocator implements Runnable {

@Override
  public void run() {...}
}

  
 
 

 
 
 
 Code Block 
 
 
 
 
 
 
 
 
language 
java 
 
 
title 
ContainerManager 
 
 
linenumbers 
true 
 
 
  
 
 
 
 
 public class ContainerManager {
/**
* Registers a container placement action to move the running container to destination host
*
* @param requestMessage request containing details of placement request
* @param containerAllocator to request physical resources
*/
public void registerContainerPlacementAction(ContainerPlacementRequestMessage requestMessage, ContainerAllocator containerAllocator) {...}

/**
* Handles the container start action for both active & standby containers. This method is invoked by the allocator thread
*
* @param request pending request for the preferred host
* @param preferredHost preferred host to start the container
* @param allocatedResource resource allocated from {@link ClusterResourceManager}
* @param resourceRequestState state of request in {@link ContainerAllocator}
* @param allocator to request resources from @{@link ClusterResourceManager}
*
* @return true if the container launch is complete, false if the container launch is in progress. 
*/
boolean handleContainerLaunch(SamzaResourceRequest request, String preferredHost, SamzaResource allocatedResource,
   ResourceRequestState resourceRequestState, ContainerAllocator allocator) {..}

/**
* Handle the container launch failure for active containers and standby (if enabled).
*
* @param processorId logical id of the container eg 1,2,3
* @param containerId last known id of the container deployed
* @param preferredHost host on which container is requested to be deployed
* @param containerAllocator allocator for requesting resources
*/
void handleContainerLaunchFail(String processorId, String containerId, String preferredHost,
   ContainerAllocator containerAllocator) {...}

/**
* Handles the state update on successful launch of a container
*
* @param processorId logical processor id of container 0,1,2
*/
void handleContainerLaunchSuccess(String processorId) {...}

/**
* Handles the action to be taken after the container has been stopped.
*
* @param processorId logical id of the container eg 1,2,3
* @param containerId last known id of the container deployed
* @param preferredHost host on which container was last deployed
* @param exitStatus exit code returned by the container
* @param preferredHostRetryDelay delay to be incurred before requesting resources
* @param containerAllocator allocator for requesting resources
*/
void handleContainerStop(String processorId, String containerId, String preferredHost, int exitStatus,
   Duration preferredHostRetryDelay, ContainerAllocator containerAllocator) {..}

/**
* Handles an expired resource request for both active and standby containers.
*
* @param processorId logical id of the container
* @param preferredHost host on which container is requested to be deployed
* @param request pending request for the preferred host
* @param allocator allocator for requesting resources
* @param resourceRequestState state of request in {@link ContainerAllocator}
*/
void handleExpiredRequest(String processorId, String preferredHost,
   SamzaResourceRequest request, ContainerAllocator allocator, ResourceRequestState resourceRequestState) {..}
}

  
 
 
 ...  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Go to page history 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-15 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 1 new edit on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain edited this page 
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Here's what changed: 
 
 
 
 
 
 
 
 
 
 
 ... 
 
 
 
 
 
 
 
 
  Pros   
  Cons   
 
 
 
 
 Simple to implement the current tool does that for host affinity enabled jobs (since they maintain locality mapping)  
  
 
 
 Needs a job restart and does a best effort to get preferred hosts for containers but has no guarantee on getting them  
 If a job has standby containers enabled, this method involves changing standby mapping in addition to active container mappings   
 Job faces downtime when the job has hundreds of containers and only one of them needs to be restarted, if it is stateful there is a likelihood that containers might not get the new asked resource on the restart and start bootstrapping  
 This solution is not scalable to be used by Controllers who want to take multiple control actions on containers across several jobs, for example, auto-sizing controller  
 This method will not be work for building Canary / Cluster Balancer  
  
 
 
 
  Solution 2. Container Placement Handler & Service [Accepted]   API design   On the basis of types of Control actions, the commands are the following:     ...  Key for storing the ContainerPlacementRequestMessage & ContainerPlacementResponseMessage in Metastore is chosen to be UUID + "." + messageType(ContainerPlacementResponseMessage or ContainerPlacementRequestMessage). Value will be payload container ContainerPlacementRequestMessage & ContainerPlacementResponseMessage. Messages are written and read to the Metastore through the MetadataStore abstraction. Since the metastore is eventually consistent, duplicate messages are required to be handled by ContainerPlacementService.    ContainerPlacementRequestMessage:  ... 
 
 
 
 
 
 
 
 
Key 
Value 
 
 
 [1,"samza-place-container-v1","88b0d30c-d518-4307-9e8e-c8529eb30f04.ContainerPlacementResponseMessage"]  
 {"processorId":"1","deploymentId":"app-atttempt-001","subType":"ContainerPlacementResponseMessage","responseMessage":"Request is accepted","uuid":"88b0d30c-d518-4307-9e8e-c8529eb30f04","destinationHost":"ANY_HOST","statusCode":"ACCEPTED","timestamp":1578694070875}  
 
 
 
  Challenges with Metastore   Metastore today (Kafka) is at least once & eventually consistent, hence ContainerPlacementService has to do in-memory caching of UUIDs of accepted actions so that it does not take one request twice in case of duplicates delivered. But the in-memory caching must not be an unbounded cache since that can result in a job running out of memory. Size of a UUID is 16bytes, at max a job lets say might have 500 containers, then one request action per container for 500 containers will result in 0.008 MBs of increase memory (just in memory lookup). If we cache lets say last 20K actions (which can accomodate 40 failovers of 500 containers in current scenario) the memory used will be 0.64 MBs at max if we implement a FIFO cache (inmemory lookup + fifo queue).   GC policy for stale messages in metastore  ...  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Go to page history 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-15 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 3 new edits on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain edited this page 
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Here's what changed: 
 
 
 
 
 
 
 
 
 
 
 ... 
 
 This system does not intend to solve the problem of Dynamic Workload Balance i.e  Cruise control. It may act as a building block for one later.  
 Solving the canary problem for YARN based deployment model is out of the scope of this solution however system built should be easily extensible to support canary   
 This system will not have built-in intelligence to find a better match for the host for a container it will make simplistic decisions as per params passed by the user.  
 ... 
 
 
 
 
 
 
 
 
 
 
Key 
Value 
Field Description 
Field Type 
 
 
"UUID.subType" 
uuid 
Unique identifier of a response message 
Required 
 
 
   
 processorId   
Logical processor id 0,1,2 of the container 
Required 
 
 
   
deploymentId 
Unique identifier for a deployment 
Required 
 
 
   
subType 
Type of message here: ContainerPlacementResponseMessageContainerPlacementRequestMessage 
Required 
 
 
   
destinationHost 
Destination host where the container is desired to be moved 
Required 
 
 
   
statusCode 
Status of the current action 
Required 
responseMessage 
Response message in conjunction to statusrequest 
Required 
 
 
   
timestamp 
The timestamp of the response message 
Required 
 
 
   
requestExpiry 
Eequest expiry which acts as a timeout for any resource request to cluster resource manager 
Optional 
 
 
 
 ... 
 
 
 
 
 
 
 
 
 
 
Key 
Value 
Field Description 
Field Type 
 
 
"UUID.subType" 
uuid 
Unique identifier of a response message 
Required 
 
 
   
 processorId   
Logical processor id 0,1,2 of the container 
Required 
 
 
   
deploymentId 
Unique identifier for a deployment 
Required 
 
 
   
subType 
Type of message here: ContainerPlacementResponseMessage 
Required 
 
 
   
destinationHost 
Destination host where the container is desired to be moved 
Required 
 
 
   
statusCode 
Status of the current actionresponse 
Required 
 
 
   
responseMessage 
Response message in conjunction to status 
Required 
 
 
   
timestamp 
The timestamp of the response message 
Required 
 
 
   
requestExpiry 
Eequest expiry which acts as a timeout for any resource request to cluster resource manager 
Optional 
 
 
 
 ...  Metastore today (Kafka) is at least once & eventually consistent, hence ContainerPlacementService has to do in-memory caching of UUIDs of accepted actions so that it does not take one request twice in case of duplicates delivered. But the in-memory caching must not be an unbounded cache since that can result in a job running out of memory. Size of a UUID is 16bytes, at max a job lets say might have 500 containers, then one request action per container for 500 containers will result in 0.008 MBs of increase memory (just in memory lookup). If we cache lets say last 20K actions (which can accomodate 40 failovers of 500 containers in the current scenario) the memory used will be 0.64 MBs at max if we implement a FIFO cache (inmemory lookup + fifo queue).  ... 
 
 Remove the HostAwareContainerAllocator & ContainerAllocator, simplify Container Allocator as a simple lightweight entity allocating requests to available resources (PR1, PR2)  
 Introduce ContainerManager which acts as a brain for validating and issuing any actions on containers in the Job Coordinator for both active & Standby containers. (PR)  
 
 Transfer state & validation of container launch & expired request handling from ContainerAllocator to ContainerManager  
 Transfer state & lifecycle management of Container allocator & resource request on boot  job start (reading locality mapping) from ClusterResourceManager.CallBack(ContainerProcessManager) to ContainerManager*  
 
 Encapsulates logic and state related to container placement actions like move, restarts for active & standby container in ContainerManager (PR-1, TDB)  
 
 It is ContainerManager’s duty to validate any ContainerPlacementRequestMessages & also invalidate messages from the previous deployment incarnation  
 It is ContainerManager’s duty to write ContainerPlacementResponseMessages to Metastore for the external control controller to query the status of the request  
 ContainerPlacementMetadata is a metadata holder for container actions (ControlActionMetadata) for ex forex request_id, current status, requested resources etc   
 
  Note:  *ClusterResourceManager.Callback (ContainerProcessManager) is tightly coupled with ClusterbasedJobCoordinator today, all the proposed changes will be done except for moving state & lifecycle management of Container allocator & resource request on boot job start (reading locality mapping) from ClusterResourceManager.CallBack(ContainerProcessManager) to ContainerManager in phase 1 of the impl

[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-15 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 5 reopened inline comments and 2 new edits on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
5 reopened inline comments 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain reopened 
 
 
  
 
 

No need to build Authentication since AM runs on hosts which are blacklisted for anyone except the Samza Team 
 
 

 
 
 
 
 
 
 
 
 LI specific detail, not true in general.  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain reopened 
 
 
  
 
 

individual namespaces 
 
 

 
 
 
 
 
 
 
 
 Are requests and responses separate namespaces? If so, why?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain reopened 
 
 
  
 
 

processorId 
 
 

 
 
 
 
 
 
 
 
 Where is UUID and deployment ID? In the payload?  Can you document the message structure (key and value).  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain reopened 
 
 
  
 
 

delete messages 
 
 

 
 
 
 
 
 
 
 
 Who does this?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain reopened 
 
 
  
 
 

References 
 
 

 
 
 
 
 
 
 
 
 Remove references to internal docs.  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
2 new edits 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain edited this page 
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Here's what changed: 
 
 
 
 
 
 
 
 
 
 
 ...  Current state: [ UNDER DISCUSSION ]  Discussion thread:  http://mail-archives.apache.org/mod_mbox/samza-dev/202001.mbox/browser JIRA: SAMZA-2373   Released: TBD  ... 
 
 One way to delete stale ContainerPlacementMessages is to delete request/responses from the previous incarnation of the job in the metastore on job restarts, this is the responsibility of ContainerPlacementService   
 Once the request is complete, ContainerPlacementService can issue an async delete to clean up the request from the metastore  
 Request/response message can be externally cleaned by a tool  
 ...  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Go to page history 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-16 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 1 resolved inline comment on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain resolved 
 
 
  
 
 

Samza Metastore 
 
 

 
 
 
 
 
 
 
 
 is this available now?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View comment 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-17 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 1 new edit on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain edited this page 
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Here's what changed: 
 
 
 
 
 
 
 
 
 
 
 ...  Container Placement service is a set of APIs built around AM to move/restart containers. Container Placement Service periodically gets a queue of placement actions per container which it issues in parallel across different containers but sequentially on one container. Each placement request has a "deploymentId" attached to it because if a job restarts all the placement actions queued for the previous deployment must be disregarded and deleted. Samza internally has an id generated for each run of a job ("app.run.id") that is generated at the job planning phase we can use that id as the "deploymentId" for placement requests.   The solution proposes to refactor & simplify the current AM code & introduce a ContainerManager which is a single entity managing container actions like start, stop for both active and standby containers. Enlisted are functions of ContainerManager & proposed refactoring around the AM code   ...  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Go to page history 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-17 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 1 new edit on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain edited this page 
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Here's what changed: 
 
 
 
 
 
 
 
 
 
 
 ... 
 
 Control Plane is a channel outside the job that allows taking control actions by multiple controllers like Samza Dashboard, Startpoints controller.   
 ContainerPlacementHandler is a stateless handler registered to control plane that dispatches placement actions to invoke Container Placement Service APIs  
   Image RemovedImage Added  This control plane can be implemented in the following ways   Option 1: Samza metastore serviceMetastore API [Preferred]   Samza Metastore will provide Metastore provides an API to write to the coordinator stream. One simple way to expose Container Placement API is, Container Placement handler can have a coordinator stream consumer polling control messages from coordinator stream & acting on them.  ...  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Go to page history 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-01-21 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 2 new inline comments on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

individual namespaces 
 
 

 
 
 
 
 
 
 
 
 Are requests and responses separate namespaces? If so, why?  
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 Now they are in same namespace with different keys  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Prateek Maheshwari  
 
 
  
 
 

References 
 
 

 
 
 
 
 
 
 
 
 Remove references to internal docs.  
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 done!  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-02-03 Thread Boris Shkolnik (Confluence)
Title: Message Title



 
 
 
There's 1 new inline comment on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Boris Shkolnik  
 
 
  
 
 

Hot StandBy 
 
 

 
 
 
 
 
 
 
 
 It'd be nice to provide a link to a page that describe this feature.  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View comment 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-02-03 Thread Boris Shkolnik (Confluence)
Title: Message Title



 
 
 
There's 2 new inline comments on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Boris Shkolnik  
 
 
  
 
 

support for Canar 
 
 

 
 
 
 
 
 
 
 
 please provide a short description on how this will work with Canary.  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Boris Shkolnik  
 
 
  
 
 

deployment id 
 
 

 
 
 
 
 
 
 
 
 what will be used as the deployment id?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-02-03 Thread Boris Shkolnik (Confluence)
Title: Message Title



 
 
 
There's 1 new inline comment on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Boris Shkolnik  
 
 
  
 
 

Metastore 
 
 

 
 
 
 
 
 
 
 
 What happens if the Metastore is not kafka? How you define the order?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View comment 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-02-04 Thread Boris Shkolnik (Confluence)
Title: Message Title



 
 
 
There's 1 new inline comment on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Boris Shkolnik  
 
 
  
 
 

we attempt to start t 
 
 

 
 
 
 
 
 
 
 
 do we really need to something special here? If the container is down, can't we just let Yarn take care of this?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View comment 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-02-04 Thread Boris Shkolnik (Confluence)
Title: Message Title



 
 
 
There's 2 new inline comments on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Boris Shkolnik  
 
 
  
 
 

ContainerProcessManager 
 
 

 
 
 
 
 
 
 
 
 can you please clarify why do we need a separate thread for the notifications, instead of callbacks to the caller?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Boris Shkolnik  
 
 
  
 
 

has a steady lag 
 
 

 
 
 
 
 
 
 
 
 not clear what is the criteria here.  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-02-05 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 6 inline comment updates and 1 new edit on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
6 inline comment updates 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Boris Shkolnik  
 
 
  
 
 

Hot StandBy 
 
 

 
 
 
 
 
 
 
 
 It'd be nice to provide a link to a page that describe this feature.  
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 Done added: SEP-19: Hot standby state for Samza applications   
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Boris Shkolnik  
 
 
  
 
 

support for Canar 
 
 

 
 
 
 
 
 
 
 
 please provide a short description on how this will work with Canary.  
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 Please see the options in 2.1 after API description  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Boris Shkolnik  
 
 
  
 
 

deployment id 
 
 

 
 
 
 
 
 
 
 
 what will be used as the deployment id?  
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 "run.id" generated by the Job planner  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain resolved 
 
 
  
 
 

we attempt to start t 
 
 

 
 
 
 
 
 
 
 
 do we really need to something special here? If the container is down, can't we just let Yarn take care of this?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment 
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 rewording this to fallback to source host  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Boris Shkolnik  
 
 
  
 
 

ContainerProcessManager 
 
 

 
 
 
 
 
 
 
 
 can you please clarify why do we need a separate thread for the notifications, instead of callbacks to the caller?  
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 ContainerPlacementRequestAllocator thread is periodically reading from metastore and relaying control messages to the ContainerManager  Container allocator thread is periodically trying to fulfill each request with allocated resources from the ClusterManager We do not need a seprate thread for notifications!  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
1 new edit 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain edited this page 
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Here's what changed: 
 
 
 
 
 
 
 
 
 
 
 ...   Selectively Spin StandBy Containers: Samza has a feature of Hot StandBy Containers for reducing stateful restoration times. Enabling this feature for a job involves doubling the containers at the least (simplest case where every container has 1 standby replica enabled). Customers are reluctant to enable this since doubling the containers increases the cost to serve. To improve the adoption for this feature we can build the ability to spin up StandBy Containers for a single or a subset of containers while the job is running, these StandBy Containers then can be used for failover to reduce downtime.    ... 
 
 If the preferred resources are not able to be acquired the active container is never stopped and a failure notification is sent for the ContainerPacementRequest  
 If the ContainerPlacementManager is not able to stop the active container (3.1 #1 above fails) in that  case the request is marked failed & a failure notification is sent for the ContainerPacementRequest  
 If ClusterResourceManager fails to start the stopped active container on the accrued destination host, then we attempt to start the container back on the container fallbacks to source host and a failure notification is are sent for the ContainerPacementRequest. If a container fails to start on source host then an attempt is made to start on ANY_HOST  
 ...  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Go to page history 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-02-05 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 2 inline comment updates and 1 new edit on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
2 inline comment updates 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain resolved 
 
 
  
 
 

populated by the client 
 
 

 
 
 
 
 
 
 
 
 How would the client know whether the uuid is unique?  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Boris Shkolnik  
 
 
  
 
 

has a steady lag 
 
 

 
 
 
 
 
 
 
 
 not clear what is the criteria here.  
 
 
  
 
 
  
 
 

 
 
 

 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain  
 
 
  
 
 

 
 
 
 
 
 
 
 
 Criteria is standby container has bootstrapped completely and the lag (watermark) is not increasing (stready) for standby  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment
• 
 
 
 
 
 
 
Like 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
1 new edit 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain edited this page 
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Here's what changed: 
 
 
 
 
 
 
 
 
 
 
 ... 
 
 
 
 
 
 
 
 
  API   
  placeContainer   
 
 
  Description   
  Active Container: Stops container process on source-host and starts it for   
 
 Stateless Job on either  
 
 Destination-host (destination host can be source as well)  
 Any host (destination-host = ANY_HOST)  
 
 Stateful Job on either   
 
 Destination-host (if specified, destination host can be source as well)  
 Standby Container (destination-host = STANDBY)  
 Any host (destination-host = ANY_HOST)  
 
  StandBy Container: Stops container process on source-host and starts it on:  
 
 
 Destination-host (if specified & matches StandBy Constraints)  
 Any host (otherwise which matches StandBy Constraints)  
 
  
 
 
  Parameters   
  uuiddeploymentId: unique identifier of a request, populated by the clientapplicationId: unique identifier of the deployed app for which the action is taken   processor-id: Samza resource id of container e.g 0, 1, 2    destination-host: valid hostname / “ANY_HOST” / “STANDBY”   request-expiry-timeout: [optional]: timeout for any resource request to the cluster manager    
 
 
  Status code   
  CREATED, BAD_REQUEST, ACCEPTED, IN_PROGRESS, SUCCEEDED, FAILED   
 
 
  Returns   
  Since this is an ASYNC API nothing is returned, UUID for the client to query the status of the request can be queried by processorId   
 
 
  Failure Scenarios   
  There are following cases under which a request to place container might fail:  
 
 When an active container stop fails, in this case, we mark the request failed  
 When requested resources cannot be obtained from the cluster manager, in this case, we mark the request failed  
 When stopped active container fails to start on destination host in that case we mark the request failed and attempt to start on the source host, failure to do so results in starting the same on ANY_HOST  
  
 
 
 
 ... 
 
 
 
 
 
 
 
 
  API   
  containerStatus   
 
 
  Description   
  Gives the status & info of the container placement request, for ex is it running, stopped what control commands are issued on it   
 
 
  Parameters   
  processor-id: Samza resource id of container e.g 0, 1, 2    applicationIddeploymentId: unique identifier of the deployed app for which the action is taken   uuid: unique identifier of a request   
 
 
  Status code   
  BAD_REQUEST   
 
 
  Returns   
  Status of the Container placement action    
 
 
 
 ... 
 
 
 
 
 
 
 
 
  API   
  controlStandBy   
 
 
  Description   
  Starts or Stops a standBy container for the active container   
 
 
  Parameters   
  processor-id: Samza resource id of container e.g 0, 1, 2    applicationIddeploymentId: unique identifier of the deployed app for which the action is taken   uuid: unique identifier of a request   
 
 
  Status code   
  CREATED, BAD_REQUEST, ACCEPTED, IN_PROGRESS, SUCCEEDED, FAILED   
 
 
Returns  
UUID for the client to query the status of the request 
 
 
 
  Architecture   For implementing a scalable container placement control system, the proposed solution is divided into two parts:  ... 
 
 
 
 Code Block 
 
 
 
 
 
 
 
 
language 
java 
 
 
title 
ContainerPlacementMessage.java 
 
 
linenumbers 
true 
 
 
  
 
 
 
 
 /**
* Encapsulates the request or response payload information between the ContainerPlacementHandler service and external
* controllers issuing placement actions
*/
public abstract class ContainerPlacementMessage {

public enum StatusCode {
 /**
  * Indicates that the container placement action is created
  */
 CREATED,

 /**
  * Indicates that the container placement action was rejected because request was deemed invalid
  */
 BAD_REQUEST,

 /**
  * Indicates that the container placement action is accepted and waiting to be processed
  */
 ACCEPTED,

 /**
  * Indicates that the container placement action is in

[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-02-09 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 1 resolved inline comment on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain resolved 
 
 
  
 
 

References 
 
 

 
 
 
 
 
 
 
 
 Remove references to internal docs.  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View comment 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-02-10 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 1 new edit on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain edited this page 
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Here's what changed: 
 
 
 
 
 
 
 
 
 
 
 ... 
 
 This system does not intend to solve the problem of Dynamic Workload Balance i.e  Cruise control. It may act as a building block for one later.  
 Solving the canary problem for YARN based deployment model is out of the scope of this solution however system built should be easily extensible to support canary   
 This system will not have built-in intelligence to find a better match for the host for a container it will make simplistic decisions as per params passed by the user.  
  SLA / SCALE / LIMITS (Assumptions)  
 
 At a time AM for a single job will only serve one request per container, parallel requests across containers are still supported. If a control request is underway any other requests issued on the same container will be queued. Same assumption holds for in-flight requests on standby and active i.e if any container placement request is in-progress for an active or its standby replica, all subsequent placement actions on either are queued  
 Actions are de-queued and sorted in order of timestamps populated by the client and are executed in that order  
 The system should be capable of scaling to be used across different jobs at the same time   
 ...  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Go to page history 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-02-11 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 3 resolved inline comments on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain resolved 
 
 
  
 
 

No need to build Authentication since AM runs on hosts which are blacklisted for anyone except the Samza Team 
 
 

 
 
 
 
 
 
 
 
 LI specific detail, not true in general.  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain resolved 
 
 
  
 
 

Hot StandBy 
 
 

 
 
 
 
 
 
 
 
 It'd be nice to provide a link to a page that describe this feature.  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment 
 
 
  
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain resolved 
 
 
  
 
 

support for Canar 
 
 

 
 
 
 
 
 
 
 
 please provide a short description on how this will work with Canary.  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
 
View comment 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-02-27 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 1 resolved inline comment on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain resolved 
 
 
  
 
 

processorId 
 
 

 
 
 
 
 
 
 
 
 Where is UUID and deployment ID? In the payload?  Can you document the message structure (key and value).  
 
 
  
 
 
  
 
 

 
 
 
 
 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View comment 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.1.2  
 
 
  
 
 
 
 
 
 
 
 
 




[CONF] Apache Samza > SEP-22: Container Placements in Samza

2020-06-01 Thread Sanil Jain (Confluence)
Title: Message Title



 
 
 
There's 1 new edit on this page 
 
 
 
 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
SEP-22: Container Placements in Samza 
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

 
 
 
 
Sanil Jain edited this page 
 
 
  
 
 

 
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Here's what changed: 
 
 
 
 
 
 
 
 
 
 
  Status   Current state: [ UNDER DISCUSSION ]ACCEPTED   Discussion thread: http://mail-archives.apache.org/mod_mbox/samza-dev/202001.mbox/browser JIRA: SAMZA-2373   Released: TBDSamza 1.5   Problem   Samza operates in a multi-tenant environment with cluster managers like Yarn and Mesos where a single host can run multiple Samza containers. Often due to soft limits configured for cluster managers like Yarn and no notion of dynamic workload balancing in Samza a host lands in a situation where it is underperforming and it is desired to move one or more containers from that host to other hosts. Today this is not possible without affecting other jobs on the hot host or restarting the affected job manually. In other use cases like resetting checkpoints of a single container or supporting canary or rolling bounces the ability to restart a single or a subset of containers without restarting the whole job is highly desirable.   ...  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
Go to page history 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
View page 
 
 
  
 
 
  
 
 
  
 
 
  
 
 
 
 
 
 
 
 
 
 
Stop watching space
• 
 
 
 
 
 
 
Manage notifications 
 
 
 
 
 
 
 
 
 
 
  
 
 
This message was sent by Atlassian Confluence 7.5.0