prateekm commented on code in PR #1682:
URL: https://github.com/apache/samza/pull/1682#discussion_r1297501881


##########
samza-core/src/main/scala/org/apache/samza/storage/ContainerStorageManager.java:
##########
@@ -319,6 +319,19 @@ private Map<TaskName, Checkpoint> restoreStores() throws 
InterruptedException {
       Map<TaskName, Checkpoint> newTaskCheckpoints = 
initRestoreAndNewCheckpointFuture.get();
       taskCheckpoints.putAll(newTaskCheckpoints);
       LOG.debug("Post init and restore checkpoints is: {}. NewTaskCheckpoints 
are: {}", taskCheckpoints, newTaskCheckpoints);
+      // Stop all persistent stores after restoring. Certain persistent stores 
opened in BulkLoad mode are compacted
+      // on stop, so paralleling stop() also parallelizes their compaction (a 
time-intensive operation).

Review Comment:
   This is no longer parallel. That can be a big performance bottleneck for 
jobs restoring from Kafka changelogs since compaction can be slow.



##########
samza-core/src/main/scala/org/apache/samza/storage/ContainerStorageManagerRestoreUtil.java:
##########
@@ -338,7 +327,9 @@ private static BlobStoreManager getBlobStoreManager(Config 
config, ExecutorServi
     BlobStoreConfig blobStoreConfig = new BlobStoreConfig(config);
     String blobStoreManagerFactory = 
blobStoreConfig.getBlobStoreManagerFactory();
     BlobStoreManagerFactory factory = 
ReflectionUtil.getObj(blobStoreManagerFactory, BlobStoreManagerFactory.class);
-    return factory.getRestoreBlobStoreManager(config, executor);
+    BlobStoreManager blobStoreManager = 
factory.getRestoreBlobStoreManager(config, executor);
+    blobStoreManager.init();

Review Comment:
   This method should not init this, it's caller should.
   
   Where is this and BlobStoreBackupManager being closed? Should also be closed 
by the caller.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to