Re: [PR] HDDS-14913. Implement Scalable CSV Export for Unhealthy Containers in Recon UI. [ozone]

via GitHub Sun, 19 Apr 2026 23:54:09 -0700


ArafatKhan2198 commented on code in PR #9994:
URL: https://github.com/apache/ozone/pull/9994#discussion_r3108772661



##########
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/persistence/ContainerHealthSchemaManager.java:
##########
@@ -381,6 +382,35 @@ public List<UnhealthyContainerRecord> 
getUnhealthyContainers(
     }
   }
 
+  /**
+   * Returns a streaming cursor over unhealthy container records.
+   * Caller MUST close the cursor.
+   *
+   * @param state filter by state, or null for all states
+   * @param limit max records to return, 0 = unlimited
+   * @return Cursor returning UnhealthyContainersRecord
+   */
+  public Cursor<UnhealthyContainersRecord> getUnhealthyContainersCursor(
+      UnHealthyContainerStates state, int limit) {

Review Comment:
   Hi @devmadhuu. You are absolutely right that 4M records will take longer to 
transfer than 1M records (in my local tests with 5M records, it took about 9.4 
seconds total).
   
   However, the StreamingOutput works differently. The API does not wait for 
the database to fetch all 4M records before responding.
   
   Because we are using a database Cursor, the Time-To-First-Byte (TTFB) is 
almost instant (~20 milliseconds), regardless of whether we are exporting 10K 
or 5M records. The HTTP response begins immediately, and the browser starts 
downloading the file right away. The connection simply stays open while the 
data streams, exactly like downloading a large file from an S3 bucket or a 
standard web server.
   
   Why the Async/Zip approach is dangerous for Recon: If we implement an async 
polling mechanism where the server zips the files first:
   
   - Disk Space Risk: The Recon server must write 170MB+ of temporary CSV files 
to its local disk, zip them, and store the zip file until the UI downloads it. 
If 5 admins click "Export All" at the same time, Recon suddenly has to write 
1GB+ to its local disk. This is a massive stability risk that could crash the 
server.
   - State Management: We would have to build a complex state machine (job 
tracking, polling endpoints, temp file cleanup cron jobs) just to serve a CSV 
file.
   The current streaming approach is stateless, uses zero local disk I/O.
   
   Since the browser handles the long-running download natively, the user 
experience is seamless. If we want to prevent massive single files for Excel 
compatibility, we can have the UI automatically trigger sequential 500K chunk 
downloads (as suggested in the other thread), but keeping the backend as a 
stateless stream is definitely the safest architecture for the server.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] HDDS-14913. Implement Scalable CSV Export for Unhealthy Containers in Recon UI. [ozone]

Reply via email to