[CONF] Apache Samza > SEP-22: Container Placements in Samza

Sanil Jain (Confluence) Tue, 14 Jan 2020 18:30:34 -0800

Title: Message Title

There's 2 new edits on this page

Sanil Jain edited this page

Here's what changed:

...

Remove the HostAwareContainerAllocator & ContainerAllocator, simplify Container Allocator as a simple lightweight entity allocating requests to available resources (PR1, PR2)
Introduce ContainerManager which acts as a brain for validating and issuing any actions on containers in the Job Coordinator for both active & Standby containers. (PR)

Transfer state & validation of container launch & expired request handling from ContainerAllocator to ContainerManager
Transfer state & validation for callback handler lifecycle management of Container allocator & resource request on boot from ClusterResourceManager.CallBack(ContainerProcessManager) to ContainerManager

Encapsulates logic and state related to container placement actions like move, restarts for active & standby container in ContainerManager (PR-1, TDB)

It is ContainerManager’s duty to validate any ContainerPlacementRequestMessages & also invalidate messages from the previous deployment incarnation
It is ContainerManager’s duty to write ContainerPlacementResponseMessages to Metastore for external control to query the status of the request
ContainerPlacementMetadata is a metadata holder for container actions (ControlActionMetadata) for ex request_id, current status, requested resources etc

Note: ClusterResourceManager.Callback (ContainerProcessManager) is tightly coupled with ClusterbasedJobCoordinator today, all the proposed changes will be done except for moving state & lifecycle management of Container allocator & resource request on boot from ClusterResourceManager.CallBack(ContainerProcessManager) to ContainerManager in phase 1 of the implementation so that this feature can be developed faster. Hence ContainerProcessManager will still be tied with ClusterBasedJobCoordinator and will intercept any container placement requests.

2.1 Container Move

2.1.1 Stateless Container Move & Stateful Container Move (without Standby)

...

If the preferred resources are not able to be accrued the active container is never stopped and a failure notification is sent for the ContainerPacementRequest
If the ContainerPlacementManager is not able to stop the active container (3.1 #1 above fails) in that case the request is marked failed & a failure notification is sent for the ContainerPacementRequest
If ClusterResourceManager fails to start the stopped active container on the accrued destination host, then we attempt to start the container back on the source host and a failure notification is sent for the ContainerPacementRequest. If container fails to start on source host then an attempt is made to start on ANY_HOST

Option 3: Stateful without Standby (Spin Up StandBy container & then move) (Phase 2) [Strech]

...

Go to page history

View page

[CONF] Apache Samza > SEP-22: Container Placements in Samza

2.1 Container Move

2.1.1 Stateless Container Move & Stateful Container Move (without Standby)

Reply via email to