Himanshu-g81 opened a new pull request, #2278: URL: https://github.com/apache/phoenix/pull/2278
High Level Description of major Replication Replay Componenets added in this PR 1. **ReplicationLogReplayService** - A singleton class that has single thread which gets all the HA groups and start Replication Replay for each using ReplicationReplay.get(conf, replicationGroup).startReplay(); every 60 seconds (configurable). Note that startReplay() of ReplicationReplay is idempotent. This is hooked into RS start / stop path in PhoenixRegionServerEndpoint.java 2. **ReplicationReplay** - Responsible for handling replication replay lifecycle for single HA Group. It initialize the file system from which replay needs to be done for this HA Group. The init method also initialize ReplicationReplayLogDiscovery (and respective ReplicationReplayStateTracker and ReplicationLogReplayFileTracker for the group) - Responsibilites of these 3 componenets are described below. 3. **ReplicationStateTracker** - Abstract class to track the replication state of single HA group (last round in sync time, etc). It's one implementation (ReplicationReplayStateTracker) is added for Replication Replay on standy cluster, similar will be required on primary cluster store & forward mode. 4. **ReplicationLogFileTracker** - Abstract class to deal with all file system interactions (getNewFiles, markInProgress, markCompleted, markInProgress). It has one implementation currently for standby cluster (ReplicationLogReplayFileTracker) overriding directory (IN directory) and metric source. Similar implementation can be added for store and forward mode (with OUT as directory) and custom metric source. 5. **ReplicationLogDiscovery** - Abstract class responsible for logic of processing files round by round. It contains ReplicationLogFileTracker and ReplicationStateTracker. It creates a thread pool (with all properties configurable, i.e. thread count, scheduling interval, etc) to process the log files round by round ([details](https://docs.google.com/document/d/1usap8PCYFU0Z4orznUPvk0tnSv0X-vnbgD_QZejrnv0/edit?tab=t.0#bookmark=id.kahc07qfainp) - this is salesforce internal doc, will update once design is published). The process method is abstract and implemented for standby cluster replay (as ReplicationReplayLogDiscovery) to apply mutations on target. Similar implementation can be added for store and forward mode - to just copy file to standby cluster. 6. **ReplicationShardDirectoryManager** - Encapsulates the logic of shard directory management (for both active and DR cluster). Only root directory needs to be given during initialization. Changes to leverage it in source are also part of this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
