devmadhuu commented on PR #10074: URL: https://github.com/apache/ozone/pull/10074#issuecomment-4253694369
> @devmadhuu Thanks for working over this, few points to be considered, > > 1. We support moving back container from deleted to closed / quasi-closed based on state from DN > > 2. open containers can be incremental based on last sync container ID, as new containers are created in increment id order. But closed / quasi-closed needs to be full sunc. So full sync can be 3 hours gap or more. > > 3. Closing may not be required as its temporary state for few minutes, DN sync can help over this. > > 4. For stale DN or some volume failure, there can be sudden spike of container mismatch like container moving to closing state from open state. Need consider full db sync for open container difference -- may be not required. And for quasi-closed/closed state, any way doing sync container IDs > > > Do we really need full db sync for quasi-closed/closed only OR for Open container state ? 1. Open containers sync happens in similar fashion based like it used to happen earlier for `CLOSED` containers, but difference lies where it check if any OPEN container is missing in Recon , then only it applies further logic of adding the container, so it is incremental in that sense , but it fetches all container ids in OPEN State available at SCM in batches (batch size is determined dynamically based on RPC message size). That part is optimized already in other PR which now fetches only list of container ids in batches and not full `containerInfo` objects which were heavy. Now RPC call is light weight and finish fetching 4-5 million container ids in multiple batches in less than ~20 secs with max upto 8 RPC calls. 2. CLOSED state containers sync also do the same in batches, with extra checks in `syncClosedContainers` -> `processSyncedClosedContainer` Kindly have a look into this method to understand the flow: https://github.com/devmadhuu/ozone/blob/c900adde6560660def18a68b748b45a6e21daace/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconStorageContainerSyncHelper.java#L244 This whole sync gets scheduled to call every 1 hour based on type of sync which is decided in https://github.com/devmadhuu/ozone/blob/c900adde6560660def18a68b748b45a6e21daace/hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/scm/ReconStorageContainerSyncHelper.java#L171 . Here we decide based on non-Open and open container drift counts -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
