... With the 0.13.0 release, Samza introduced a flexible deployment model which enables you to run it in containerized environments, with resource managers other than YARN, or in the cloud with the proper coordination primitives. It also enables you to run Samza as a library, within your application. The motivation to run Samza Embedded on Azure is the following:
- Its current dependency on Zookeeper
...
- increases our customers’ reliability on the infrastructure, and does not help with modularity.
...
- Zookeeper is tedious to maintain and does not help in componentization.
- Introducing a coordination service in Azure will help identify issues with the current Job Coordinator design, and validate the functionalities that Samza Embedded claims it provides.
- If incorporated with the EventHub connector for Brooklin, it will give us an end-to-end system running in Azure, giving more motivation to teams in Microsoft, to incorporate Samza in their existing systems.
...
-
- We will also get all the advantages of moving to the cloud infrastructure.
Proposed Changes
-
Implement the AzureJobCoordinator on top of current JobCoordinator.
-
Implement the Latch (lock) and Leader functionality with Lease Blobs in Azure. These are pluggable components. A blob in Azure storage is used for storing large amounts of unstructured data. A Lease Blob is an operation that establishes and manages a lock on a blob for write and delete operations. We will use this service to elect the leader when running in Azure.
-
Implement the checkpointing mechanism with Azure Storage.
-
Integrate all of this with the EventHubSystemProducer and EventHubSystemConsumer.
... The following interfaces will be implemented for Azure:
-
JobCoordinator
public
class AzureJobCoordinator
implements JobCoordinator
{}
|
-
Latch
public
class AzureLatch
implements Latch
{}
|
-
LeaderElection
public
class AzureLeaderElector implements LeaderElector {}
|
The following config values will be introduced for this implementation:
- org.apache.samza.azure.AzureJobCoordinatorFactory for job.coordinator.factory.
- Azure Storage Account Name
- Azure Storage Account Key
Implementation and Test Plan ... The changes made in this proposal will be backward compatible. A new config value for the JobCoordinatorFactory will be introduced. The client just needs to change the config file and assign the job.coordinator.factory variable to org.apache.samza.azure.AzureJobCoordinatorFactory. Rejected Alternatives NA Future Work
- Implementing the checkpointing mechanism with Azure Storage.
|