ArafatKhan2198 commented on code in PR #10524:
URL: https://github.com/apache/ozone/pull/10524#discussion_r3453413668


##########
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/chatbot/agent/ChatbotUtils.java:
##########
@@ -129,82 +71,21 @@ public static boolean isBucketScopedListKeysPrefix(String 
startPrefix) {
   // JSON & Text Utilities
   // =========================================================================
 
-  /**
-   * <p>LLMs sometimes wrap their JSON response in prose text (e.g. "Here is 
the result: {...}")
-   * despite being instructed to return JSON only. A simple greedy regex like 
{@code \{.*\}}
-   * fails for nested objects because it can match from the first {@code {} to 
the last {@code }}
-   * in the entire string, returning multiple concatenated objects or 
truncating nested ones.
-   *
-   * <p>This method uses brace-counting with string-awareness to reliably 
extract the first
-   * outermost JSON object regardless of surrounding text, nesting depth, or 
number of
-   * objects in the response:
-   *
-   * @param text the raw LLM response string, which may contain prose 
before/after JSON
-   * @return the first complete JSON object string, or {@code null} if none is 
found
-   */
-  public static String extractFirstJsonObject(String text) {
-    if (text == null) {
-      return null;
-    }
-    int depth = 0;
-    int start = -1;
-    boolean inString = false;
-    boolean escape = false;
-    for (int i = 0; i < text.length(); i++) {
-      char c = text.charAt(i);
-      if (escape) {
-        escape = false;
-        continue;
-      }
-      if (c == '\\' && inString) {
-        escape = true;
-        continue;
-      }
-      if (c == '"') {
-        inString = !inString;
-        continue;
-      }
-      if (inString) {
-        continue;
-      }
-      if (c == '{') {
-        if (depth == 0) {
-          start = i;
-        }
-        depth++;
-      } else if (c == '}') {
-        depth--;
-        if (depth == 0 && start != -1) {
-          return text.substring(start, i + 1);
-        }
-      }
-    }
-    return null;
-  }
-
   public static int parsePositiveInt(String value, int defaultValue) {
     if (StringUtils.isBlank(value)) {
       return defaultValue;
     }
     try {
       int parsed = Integer.parseInt(value.trim());
-      return parsed > 0 ? parsed : defaultValue;
+      if (parsed <= 0) {
+        throw new IllegalArgumentException("limit must be a positive integer");
+      }
+      return parsed;
     } catch (NumberFormatException e) {
       return defaultValue;
     }
   }
 
-  public static String extractStringField(JsonNode node, String field) {
-    if (node == null || field == null || field.isEmpty()) {
-      return null;
-    }
-    JsonNode fieldNode = node.get(field);
-    if (fieldNode == null || fieldNode.isNull()) {
-      return null;
-    }
-    return fieldNode.asText("");
-  }
-
   public static int estimateRecordCount(JsonNode response) {

Review Comment:
   You're right, and this was a real bug. The old `estimateRecordCount` only 
looked for a record array at a few fixed spots (top-level array, `keys`, or 
`data`), so for responses like `listKeys` (records under `fso`/`nonFSO`) or 
containers (records under `data.containers`) it found nothing and returned 0. 
That meant `truncated` never fired and the chatbot would report a capped 
1000-record sample as the complete result.
   
   Rather than hardcode the record field name for each endpoint (which would 
just break again whenever an API's response shape changes), I made the counter 
**shape-agnostic**. 
   
   So as it walks the JSON:
   
   - **Array of objects** (e.g. `containers`, `fso`, `nonFSO`, 
`deletedKeyInfo`) → counts each item, and keeps going deeper.
   - **Array of plain values** (numbers/strings, e.g. histogram bins) → 
**ignored**, because those aren't records.
   - **Plain fields** (`totalCount`, `prevKey`, `state`, etc.) → ignored.
   - **Nested objects** → it just walks into them looking for more record 
arrays.
   
   Then it returns the **sum** of all the record-like arrays it found, no 
matter where they sit or what they're named.
   
   A few consequences worth knowing:
   
   - **Split lists get summed** — `listKeys` returns `fso` *and* `nonFSO`, so 
`600 + 400 = 1000`. That's intended.
   - **It can slightly over-count** if records themselves contain nested object 
arrays (e.g. a container that lists its replicas). We accept that on purpose — 
over-counting only makes the "this might be a partial sample" warning fire a 
touch early, which is the safe direction. It will never under-count and wrongly 
claim a capped list is complete.
   - **It's an estimate, not an exact total** — its only job is to answer "did 
we likely hit the 1000 cap?", not to be a precise record count for display.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to