Copilot commented on code in PR #3999:
URL: https://github.com/apache/hertzbeat/pull/3999#discussion_r2716003211


##########
hertzbeat-collector/hertzbeat-collector-common/src/main/java/org/apache/hertzbeat/collector/constants/CollectorConstants.java:
##########
@@ -49,4 +49,28 @@ public interface CollectorConstants extends NetworkConstants 
{
 
     String STATUS_CODE = "statusCode";
 
+    /**
+     * Maximum XML response size in bytes (10MB) to prevent DoS attacks
+     */
+    int MAX_XML_RESPONSE_SIZE = 10 * 1024 * 1024;
+
+    /**
+     * Maximum number of nodes returned by XPath query to prevent excessive 
resource consumption
+     */
+    int MAX_XPATH_RESULT_NODES = 1000;
+
+    /**
+     * Dangerous XPath expression patterns that could cause DoS attacks
+     * These patterns match expressions that traverse the entire XML document
+     */
+    String[] DANGEROUS_XPATH_PATTERNS = {
+        "//\\*\\s*\\|\\s*//@\\*\\s*\\|\\s*//text\\(\\)",   // //* | //@* | 
//text()
+        "//\\*\\s*\\|",                                      // //* | ...
+        "//@\\*\\s*\\|",                                     // //@* | ...
+        "//node\\(\\)\\s*\\|",                               // //node() | ...
+        "descendant-or-self::node\\(\\)\\s*\\|",            // 
descendant-or-self::node() | ...
+        "/descendant-or-self::node\\(\\)",                   // 
/descendant-or-self::node()
+        "//\\*[\\s\\S]*//\\*"                                // //** with 
multiple wildcards

Review Comment:
   The inline comment `// //** with multiple wildcards` does not precisely 
describe the regex pattern `"//\\*[\\s\\S]*//\\*"` (which matches two `//*` 
segments with arbitrary content between them, not literally `//**`). To avoid 
confusion when maintaining these security-sensitive patterns, consider 
rephrasing the comment to more accurately describe what the regex is matching.
   ```suggestion
           "//\\*[\\s\\S]*//\\*"                                // //* ... //* 
(two wildcard-descendant segments with arbitrary content between)
   ```



##########
hertzbeat-collector/hertzbeat-collector-basic/src/main/java/org/apache/hertzbeat/collector/collect/http/HttpCollectImpl.java:
##########
@@ -357,21 +374,83 @@ private void parseResponseBySiteMap(String resp, 
List<String> aliasFields,
         }
     }
 
+    /**
+     * Validates the XPath expression to prevent DoS attacks.
+     * Checks for dangerous patterns that could traverse the entire XML 
document.
+     * Uses pre-compiled patterns for performance and case-insensitive 
matching for security.
+     *
+     * @param xpathExpression the XPath expression to validate
+     * @throws IllegalArgumentException if the expression contains dangerous 
patterns
+     */
+    private void validateXPathExpression(String xpathExpression) throws 
IllegalArgumentException {
+        if (!StringUtils.hasText(xpathExpression)) {
+            return;
+        }
+
+        // Check against dangerous patterns using pre-compiled regex
+        for (Pattern pattern : DANGEROUS_XPATH_PATTERNS) {
+            Matcher matcher = pattern.matcher(xpathExpression);
+            if (matcher.find()) {
+                throw new IllegalArgumentException(
+                    "XPath expression contains dangerous pattern that may 
cause DoS: " + pattern.pattern()
+                );
+            }
+        }
+
+        // Check for excessive wildcard usage (more than 3 // or * operators)
+        long descendantAxisCount = xpathExpression.chars().filter(ch -> ch == 
'/').count();
+        long wildcardCount = xpathExpression.chars().filter(ch -> ch == 
'*').count();
+

Review Comment:
   The comment "Check for excessive wildcard usage (more than 3 // or * 
operators)" does not match the implementation: the code counts every '/' (not 
specifically the '//' descendant axis) and uses thresholds of 10 for '/' and 5 
for '*' instead of 3. Please either adjust the thresholds/logic to match the 
documented rule, or update the comment so it accurately describes the condition 
being enforced (e.g., clarify it's counting total '/' and '*' characters with 
the given limits).
   ```suggestion
           // Check for excessive traversal and wildcard usage by counting 
total '/' and '*' characters.
           long descendantAxisCount = xpathExpression.chars().filter(ch -> ch 
== '/').count();
           long wildcardCount = xpathExpression.chars().filter(ch -> ch == 
'*').count();
   
           // Heuristic limits: more than 10 '/' characters or more than 5 '*' 
characters is considered risky.
   ```



##########
hertzbeat-collector/hertzbeat-collector-basic/src/main/java/org/apache/hertzbeat/collector/collect/http/HttpCollectImpl.java:
##########
@@ -357,21 +374,83 @@ private void parseResponseBySiteMap(String resp, 
List<String> aliasFields,
         }
     }
 
+    /**
+     * Validates the XPath expression to prevent DoS attacks.
+     * Checks for dangerous patterns that could traverse the entire XML 
document.
+     * Uses pre-compiled patterns for performance and case-insensitive 
matching for security.
+     *
+     * @param xpathExpression the XPath expression to validate
+     * @throws IllegalArgumentException if the expression contains dangerous 
patterns
+     */
+    private void validateXPathExpression(String xpathExpression) throws 
IllegalArgumentException {
+        if (!StringUtils.hasText(xpathExpression)) {
+            return;
+        }
+
+        // Check against dangerous patterns using pre-compiled regex
+        for (Pattern pattern : DANGEROUS_XPATH_PATTERNS) {
+            Matcher matcher = pattern.matcher(xpathExpression);
+            if (matcher.find()) {
+                throw new IllegalArgumentException(
+                    "XPath expression contains dangerous pattern that may 
cause DoS: " + pattern.pattern()
+                );
+            }
+        }
+
+        // Check for excessive wildcard usage (more than 3 // or * operators)
+        long descendantAxisCount = xpathExpression.chars().filter(ch -> ch == 
'/').count();
+        long wildcardCount = xpathExpression.chars().filter(ch -> ch == 
'*').count();
+
+        if (descendantAxisCount > 10 || wildcardCount > 5) {
+            throw new IllegalArgumentException(
+                "XPath expression contains too many wildcards or descendant 
axes, potential DoS risk"
+            );
+        }
+
+        log.debug("XPath expression validation passed: {}", xpathExpression);
+    }
+
     private void parseResponseByXmlPath(String resp, Metrics metrics,
                                         CollectRep.MetricsData.Builder 
builder, Long responseTime) {
         HttpProtocol http = metrics.getHttp();
         List<String> aliasFields = metrics.getAliasFields();
         String xpathExpression = http.getParseScript();
+
+        // Layer 1: Validate XPath expression is not empty
         if (!StringUtils.hasText(xpathExpression)) {
             log.warn("Http collect parse type is xmlPath, but the xpath 
expression is empty.");
             builder.setCode(CollectRep.Code.FAIL);
             builder.setMsg("XPath expression is empty");
             return;
         }
+
+        // Layer 2: Check XML response size to prevent memory exhaustion
+        if (resp != null && resp.length() > 
CollectorConstants.MAX_XML_RESPONSE_SIZE) {
+            log.warn("XML response size {} bytes exceeds maximum allowed size 
{} bytes",
+                resp.length(), CollectorConstants.MAX_XML_RESPONSE_SIZE);
+            builder.setCode(CollectRep.Code.FAIL);
+            builder.setMsg("XML response exceeds maximum allowed size of "
+                + (CollectorConstants.MAX_XML_RESPONSE_SIZE / 1024 / 1024) + 
"MB");
+            return;

Review Comment:
   `MAX_XML_RESPONSE_SIZE` is documented as a byte limit, but here it's 
compared against `resp.length()` and logged as "bytes", which is the number of 
Java characters, not the original UTF-8 payload size. For XML with many 
multibyte characters this can allow responses substantially larger than the 
intended 10MB while still reporting them as within the byte limit. Consider 
either measuring the actual byte length (e.g., via the original `HttpEntity` or 
re-encoding with the correct charset) or updating the constant's Javadoc and 
log/message text to make it clear this is a character-count limit instead of 
bytes.
   ```suggestion
           if (resp != null) {
               int respSizeBytes = resp.getBytes(StandardCharsets.UTF_8).length;
               if (respSizeBytes > CollectorConstants.MAX_XML_RESPONSE_SIZE) {
                   log.warn("XML response size {} bytes exceeds maximum allowed 
size {} bytes",
                       respSizeBytes, CollectorConstants.MAX_XML_RESPONSE_SIZE);
                   builder.setCode(CollectRep.Code.FAIL);
                   builder.setMsg("XML response exceeds maximum allowed size of 
"
                       + (CollectorConstants.MAX_XML_RESPONSE_SIZE / 1024 / 
1024) + "MB");
                   return;
               }
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to