ArafatKhan2198 commented on code in PR #10524:
URL: https://github.com/apache/ozone/pull/10524#discussion_r3453413668
##########
hadoop-ozone/recon/src/main/java/org/apache/hadoop/ozone/recon/chatbot/agent/ChatbotUtils.java:
##########
@@ -129,82 +71,21 @@ public static boolean isBucketScopedListKeysPrefix(String
startPrefix) {
// JSON & Text Utilities
// =========================================================================
- /**
- * <p>LLMs sometimes wrap their JSON response in prose text (e.g. "Here is
the result: {...}")
- * despite being instructed to return JSON only. A simple greedy regex like
{@code \{.*\}}
- * fails for nested objects because it can match from the first {@code {} to
the last {@code }}
- * in the entire string, returning multiple concatenated objects or
truncating nested ones.
- *
- * <p>This method uses brace-counting with string-awareness to reliably
extract the first
- * outermost JSON object regardless of surrounding text, nesting depth, or
number of
- * objects in the response:
- *
- * @param text the raw LLM response string, which may contain prose
before/after JSON
- * @return the first complete JSON object string, or {@code null} if none is
found
- */
- public static String extractFirstJsonObject(String text) {
- if (text == null) {
- return null;
- }
- int depth = 0;
- int start = -1;
- boolean inString = false;
- boolean escape = false;
- for (int i = 0; i < text.length(); i++) {
- char c = text.charAt(i);
- if (escape) {
- escape = false;
- continue;
- }
- if (c == '\\' && inString) {
- escape = true;
- continue;
- }
- if (c == '"') {
- inString = !inString;
- continue;
- }
- if (inString) {
- continue;
- }
- if (c == '{') {
- if (depth == 0) {
- start = i;
- }
- depth++;
- } else if (c == '}') {
- depth--;
- if (depth == 0 && start != -1) {
- return text.substring(start, i + 1);
- }
- }
- }
- return null;
- }
-
public static int parsePositiveInt(String value, int defaultValue) {
if (StringUtils.isBlank(value)) {
return defaultValue;
}
try {
int parsed = Integer.parseInt(value.trim());
- return parsed > 0 ? parsed : defaultValue;
+ if (parsed <= 0) {
+ throw new IllegalArgumentException("limit must be a positive integer");
+ }
+ return parsed;
} catch (NumberFormatException e) {
return defaultValue;
}
}
- public static String extractStringField(JsonNode node, String field) {
- if (node == null || field == null || field.isEmpty()) {
- return null;
- }
- JsonNode fieldNode = node.get(field);
- if (fieldNode == null || fieldNode.isNull()) {
- return null;
- }
- return fieldNode.asText("");
- }
-
public static int estimateRecordCount(JsonNode response) {
Review Comment:
You're right the old logic only looked at a few top-level array field names
(`data`, `keys`, etc.), so many Recon responses (e.g. nested `data.containers`,
or object-wrapped payloads) could report 0 records and skip truncation
detection.
Replaced it with a recursive walk that sums every **array-of-objects**
anywhere in the JSON tree, regardless of field name. Scalar arrays (size bins,
etc.) are skipped. The count only needs to be good enough to detect the
1000-record cap slightly high is acceptable and only triggers the truncation
warning sooner.
Added unit coverage in `TestChatbotUtils` (legacy `data[]`, nested shapes,
mixed arrays, cap boundaries) and `TestReconQueryExecutor` (999 vs 1000 at the
executor boundary).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]