devmadhuu commented on code in PR #10532:
URL: https://github.com/apache/ozone/pull/10532#discussion_r3434144018


##########
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/persistence/ContainerHealthSchemaManager.java:
##########
@@ -205,28 +205,75 @@ public void replaceUnhealthyContainerRecordsAtomically(
 
   private int deleteScmStatesForContainers(DSLContext dslContext,
       List<Long> containerIds) {
+    if (containerIds.isEmpty()) {
+      return 0;
+    }
+
+    List<Long> sortedIds = containerIds.stream()
+        .distinct()
+        .sorted()
+        .collect(Collectors.toList());
+
     int totalDeleted = 0;
+    List<Long> inClauseBatch = new ArrayList<>(MAX_IN_CLAUSE_CHUNK_SIZE);
+
+    for (int i = 0; i < sortedIds.size(); ) {
+      int rangeStart = i;

Review Comment:
   This below code assumption seems incorrect that in real cluster that the 
unhealthy container ids all will be in continous sequence.
   
   Real container IDs may not form one continuous sequence.
   
     Consider this input:
   
     `1, 2, 4, 5, 7, 8, 10, 11`
   
     The PR sees four small continuous ranges and executes:
   
     BETWEEN 1 AND 2
     BETWEEN 4 AND 5
     BETWEEN 7 AND 8
     BETWEEN 10 AND 11
   
     That means four separate DELETE statements.
   
     The old implementation could delete all eight IDs using one statement:
   
     `WHERE container_id IN (1, 2, 4, 5, 7, 8, 10, 11)`
   
     With a larger realistic list containing many small pairs, the difference 
could become:
   
     Old code: 50 DELETE statements
     New code: 10,000 DELETE statements
   
     Each statement must be compiled and executed by Derby. Consequently, 
production could become significantly slower even though this test becomes 
faster.
   
   `1, 2, 3, 4, ... 200,000`
   
     That is the best possible input for BETWEEN.
   
     It does not test inputs such as:
   
     1, 2, 10, 11, 20, 21, ...



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to