ArafatKhan2198 commented on code in PR #9994:
URL: https://github.com/apache/ozone/pull/9994#discussion_r3121818241


##########
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/persistence/ContainerHealthSchemaManager.java:
##########
@@ -381,6 +382,73 @@ public List<UnhealthyContainerRecord> 
getUnhealthyContainers(
     }
   }
 
+  /**
+   * Returns a streaming cursor over unhealthy container records.
+   * Caller MUST close the cursor.
+   *
+   * Final Complete Query Examples:
+   * Example 1: Export 50,000 MISSING containers, starting after container ID 
12345
+   *
+   * SELECT * FROM unhealthy_containers
+   * WHERE container_state = 'MISSING'
+   *   AND container_id > 12345
+   * ORDER BY container_id ASC
+   * LIMIT 50000
+   * JDBC will fetch 10,000 rows at a time (5 batches total)
+   *
+   * Example 2: Export all 100,000 containers (all states), no pagination
+   *
+   * SELECT * FROM unhealthy_containers
+   * ORDER BY container_state ASC, container_id ASC
+   * LIMIT 100000
+   * JDBC will fetch 10,000 rows at a time (10 batches total)
+   *
+   * @param state filter by state, or null for all states
+   * @param limit max records to return, 0 = unlimited
+   * @param prevKey previous container ID to skip, for pagination
+   * @return Cursor returning UnhealthyContainersRecord
+   */
+  public Cursor<UnhealthyContainersRecord> getUnhealthyContainersCursor(
+      UnHealthyContainerStates state, int limit, long prevKey) {
+    DSLContext dslContext = containerSchemaDefinition.getDSLContext();
+
+    // In plain SQL: SELECT * FROM unhealthy_containers
+    org.jooq.SelectQuery<UnhealthyContainersRecord> query = 
dslContext.selectFrom(UNHEALTHY_CONTAINERS).getQuery();
+
+    if (state != null) {
+      // Filtering by ONE specific state (e.g., state = "MISSING")
+      // WHERE container_state = 'MISSING'
+      
query.addConditions(UNHEALTHY_CONTAINERS.CONTAINER_STATE.eq(state.toString()));
+      if (prevKey > 0) {
+        // AND container_id > 12345
+        query.addConditions(UNHEALTHY_CONTAINERS.CONTAINER_ID.gt(prevKey));
+      }
+      // ORDER BY container_id ASC
+      query.addOrderBy(UNHEALTHY_CONTAINERS.CONTAINER_ID.asc());
+    } else {
+      // Exporting ALL states (state = null)
+      if (prevKey > 0) {
+        // WHERE container_id > 12345
+        query.addConditions(UNHEALTHY_CONTAINERS.CONTAINER_ID.gt(prevKey));
+      }
+      // ORDER BY container_state ASC, container_id ASC
+      query.addOrderBy(
+          UNHEALTHY_CONTAINERS.CONTAINER_STATE.asc(),
+          UNHEALTHY_CONTAINERS.CONTAINER_ID.asc()
+      );
+    }
+
+    if (limit > 0) {
+      query.addLimit(limit);
+    }
+
+    // This doesn't change the SQL query.
+    // It tells JDBC: "When you fetch data, bring me 10,000 rows at a time, 
not one-by-one"
+    query.fetchSize(10000);

Review Comment:
   Hey @devmadhuu, A larger fetch size means fewer round-trips to the database, 
not more. Without this setting, Derby/JDBC defaults to fetching as few as 1 - 
10 rows per round-trip, which means 1 million+ round-trips for a 10M row 
export. With fetchSize(10000), that drops to just 1,000 round-trips strictly 
faster.
   
   If you'd like even fewer round-trips for very large exports, I can increase 
it to fetchSize(50000) (200 round-trips for 10M rows), at the cost of slightly 
more memory per batch (~50K rows in the JDBC buffer at a time, roughly a few 
MB). Happy to bump it up if you think that's better.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to