ArafatKhan2198 commented on code in PR #9994:
URL: https://github.com/apache/ozone/pull/9994#discussion_r3108723712
##########
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/api/ContainerEndpoint.java:
##########
@@ -450,6 +458,65 @@ private Response getUnhealthyContainersFromSchema(
return Response.ok(response).build();
}
+ /**
+ * Export all unhealthy containers into a CSV file by streaming the results
directly
+ * from the database without holding them in the JVM heap.
+ *
+ * @param state The container state to filter by, or null for all.
+ * @param limit The maximum number of records to return, 0 for unlimited.
+ * @return {@link Response} containing the CSV StreamingOutput.
+ */
+ @GET
+ @Path("/unhealthy/export")
+ @Produces("text/csv")
+ public Response exportUnhealthyContainers(
+ @QueryParam("state") String state,
+ @DefaultValue("0") @QueryParam(RECON_QUERY_LIMIT) int limit) {
Review Comment:
Hey @devmadhuu, thanks for the feedback! I totally get the concern about 4M+
records, but this actually handles full exports perfectly without needing an
async/polling setup.
The trick here is that we use a **database cursor** combined with a
**streaming HTTP response**.
This means:
1. **Zero memory issues:** We never load the 4M records into memory. The DB
reads a tiny batch, streams it directly to the browser, and forgets it. Memory
usage stays flat at ~256KB whether you export 10K or 10M records.
2. **Instant downloads:** Because of the new index added in this PR, the DB
doesn't have to sort the 4M records first. The browser starts downloading the
file instantly.
I actually benchmarked this locally with **5,000,000 records** for a single
state. Here are the results:
| Test Limit | Time to First Byte (TTFB) | Total Time | Downloaded Size |
Memory Used |
| --- | --- | --- | --- | --- |
| 10,000 | 0.032 s | 0.038 s | 0.31 MB | ~256 KB |
| 100,000 | 0.019 s | 0.220 s | 3.23 MB | ~256 KB |
| 1,000,000 | 0.019 s | 3.317 s | 33.27 MB | ~256 KB |
| **Complete (5,000,000)** | **0.020 s** | **9.439 s** | **170.60 MB** |
**~256 KB** |
As you can see, even for 5 million records, the download starts in **0.02
seconds** and finishes streaming 170MB in just **9.4 seconds** with almost zero
memory overhead on the Recon server.
If we went the async route, Recon would have to write these massive 170MB
CSV files to its local disk temporarily, which introduces disk cleanup
headaches. This streaming approach skips the disk entirely, uses almost no RAM,
and is super fast. Let me know what you think!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]