...
-
Implement the AzureJobCoordinator on top of current JobCoordinator. Implement
-
Change the current Latch (lock) and implementation to a distributed lock based implementation, and implement it for Zookeeper.
-
Implement the distributed Lock and Leader functionality with Lease Blobs in Azure. These are pluggable components. A blob in Azure storage is used for storing large amounts of unstructured data. A Lease Blob is an operation that establishes and manages a lock on a blob for write and delete operations. We will use this service to elect the leader when running in Azure as follows:
-
All the processors will try to acquire a lease on the same shared blob in Azure storage
-
The processor that acquires the lease becomes the leader and automatically renews the lease at a constant time interval.
- In parallel, all the worker processors keep trying to acquire a lease on the same blob at constant time interval (T1). Acquiring a lease on a blob that has already been leased by someone results in failure.
- When the leader dies, it results in release of the lease, following which one of the worker processes will acquire a new lease on the blob. This ensures that the system will not exist without a leader for more than (T1 + lease acquisition) time.
- Implement a barrier in Azure. According to my current research, Azure does not have a watch/subscribe functionality in Azure storage. This makes the barrier implementation a little challenging. Any change in the list of processors in the system will by monitoring each processor's heartbeats. Once the leader is elected, every processor will heartbeat to the leader at regular time intervals. If the leader does not hear from a processor after a particular time, it will assume that that processor has died and call the onProcessorChange() function. This will initiate the generation of a new job model which the leader will publish to the blob. The leader will also keep track of whether every processor has the updated version of the job model. In the background, every worker processor continuously polls the blob to check if any data on it has been modified since the last time it was retrieved. If true, it stops all current processing and retrieves the new job model. All worker processes start processing once they validate with the leader that every processor has got the same updated version of the job model.
-
Implement the checkpointing mechanism with Azure Storage.
-
Integrate all of this with the EventHubSystemProducer and EventHubSystemConsumer.
... The following interfaces will be implemented for Azure:
The following config values will be introduced for this implementation:
- org.apache.samza.azure.AzureJobCoordinatorFactory for job.coordinator.factory.
- Azure Storage Account Name
- Azure Storage Account Key
Implementation and Test Plan ... |