Copilot commented on code in PR #3999:
URL: https://github.com/apache/hertzbeat/pull/3999#discussion_r2716003211
##########
hertzbeat-collector/hertzbeat-collector-common/src/main/java/org/apache/hertzbeat/collector/constants/CollectorConstants.java:
##########
@@ -49,4 +49,28 @@ public interface CollectorConstants extends NetworkConstants
{
String STATUS_CODE = "statusCode";
+ /**
+ * Maximum XML response size in bytes (10MB) to prevent DoS attacks
+ */
+ int MAX_XML_RESPONSE_SIZE = 10 * 1024 * 1024;
+
+ /**
+ * Maximum number of nodes returned by XPath query to prevent excessive
resource consumption
+ */
+ int MAX_XPATH_RESULT_NODES = 1000;
+
+ /**
+ * Dangerous XPath expression patterns that could cause DoS attacks
+ * These patterns match expressions that traverse the entire XML document
+ */
+ String[] DANGEROUS_XPATH_PATTERNS = {
+ "//\\*\\s*\\|\\s*//@\\*\\s*\\|\\s*//text\\(\\)", // //* | //@* |
//text()
+ "//\\*\\s*\\|", // //* | ...
+ "//@\\*\\s*\\|", // //@* | ...
+ "//node\\(\\)\\s*\\|", // //node() | ...
+ "descendant-or-self::node\\(\\)\\s*\\|", //
descendant-or-self::node() | ...
+ "/descendant-or-self::node\\(\\)", //
/descendant-or-self::node()
+ "//\\*[\\s\\S]*//\\*" // //** with
multiple wildcards
Review Comment:
The inline comment `// //** with multiple wildcards` does not precisely
describe the regex pattern `"//\\*[\\s\\S]*//\\*"` (which matches two `//*`
segments with arbitrary content between them, not literally `//**`). To avoid
confusion when maintaining these security-sensitive patterns, consider
rephrasing the comment to more accurately describe what the regex is matching.
```suggestion
"//\\*[\\s\\S]*//\\*" // //* ... //*
(two wildcard-descendant segments with arbitrary content between)
```
##########
hertzbeat-collector/hertzbeat-collector-basic/src/main/java/org/apache/hertzbeat/collector/collect/http/HttpCollectImpl.java:
##########
@@ -357,21 +374,83 @@ private void parseResponseBySiteMap(String resp,
List<String> aliasFields,
}
}
+ /**
+ * Validates the XPath expression to prevent DoS attacks.
+ * Checks for dangerous patterns that could traverse the entire XML
document.
+ * Uses pre-compiled patterns for performance and case-insensitive
matching for security.
+ *
+ * @param xpathExpression the XPath expression to validate
+ * @throws IllegalArgumentException if the expression contains dangerous
patterns
+ */
+ private void validateXPathExpression(String xpathExpression) throws
IllegalArgumentException {
+ if (!StringUtils.hasText(xpathExpression)) {
+ return;
+ }
+
+ // Check against dangerous patterns using pre-compiled regex
+ for (Pattern pattern : DANGEROUS_XPATH_PATTERNS) {
+ Matcher matcher = pattern.matcher(xpathExpression);
+ if (matcher.find()) {
+ throw new IllegalArgumentException(
+ "XPath expression contains dangerous pattern that may
cause DoS: " + pattern.pattern()
+ );
+ }
+ }
+
+ // Check for excessive wildcard usage (more than 3 // or * operators)
+ long descendantAxisCount = xpathExpression.chars().filter(ch -> ch ==
'/').count();
+ long wildcardCount = xpathExpression.chars().filter(ch -> ch ==
'*').count();
+
Review Comment:
The comment "Check for excessive wildcard usage (more than 3 // or *
operators)" does not match the implementation: the code counts every '/' (not
specifically the '//' descendant axis) and uses thresholds of 10 for '/' and 5
for '*' instead of 3. Please either adjust the thresholds/logic to match the
documented rule, or update the comment so it accurately describes the condition
being enforced (e.g., clarify it's counting total '/' and '*' characters with
the given limits).
```suggestion
// Check for excessive traversal and wildcard usage by counting
total '/' and '*' characters.
long descendantAxisCount = xpathExpression.chars().filter(ch -> ch
== '/').count();
long wildcardCount = xpathExpression.chars().filter(ch -> ch ==
'*').count();
// Heuristic limits: more than 10 '/' characters or more than 5 '*'
characters is considered risky.
```
##########
hertzbeat-collector/hertzbeat-collector-basic/src/main/java/org/apache/hertzbeat/collector/collect/http/HttpCollectImpl.java:
##########
@@ -357,21 +374,83 @@ private void parseResponseBySiteMap(String resp,
List<String> aliasFields,
}
}
+ /**
+ * Validates the XPath expression to prevent DoS attacks.
+ * Checks for dangerous patterns that could traverse the entire XML
document.
+ * Uses pre-compiled patterns for performance and case-insensitive
matching for security.
+ *
+ * @param xpathExpression the XPath expression to validate
+ * @throws IllegalArgumentException if the expression contains dangerous
patterns
+ */
+ private void validateXPathExpression(String xpathExpression) throws
IllegalArgumentException {
+ if (!StringUtils.hasText(xpathExpression)) {
+ return;
+ }
+
+ // Check against dangerous patterns using pre-compiled regex
+ for (Pattern pattern : DANGEROUS_XPATH_PATTERNS) {
+ Matcher matcher = pattern.matcher(xpathExpression);
+ if (matcher.find()) {
+ throw new IllegalArgumentException(
+ "XPath expression contains dangerous pattern that may
cause DoS: " + pattern.pattern()
+ );
+ }
+ }
+
+ // Check for excessive wildcard usage (more than 3 // or * operators)
+ long descendantAxisCount = xpathExpression.chars().filter(ch -> ch ==
'/').count();
+ long wildcardCount = xpathExpression.chars().filter(ch -> ch ==
'*').count();
+
+ if (descendantAxisCount > 10 || wildcardCount > 5) {
+ throw new IllegalArgumentException(
+ "XPath expression contains too many wildcards or descendant
axes, potential DoS risk"
+ );
+ }
+
+ log.debug("XPath expression validation passed: {}", xpathExpression);
+ }
+
private void parseResponseByXmlPath(String resp, Metrics metrics,
CollectRep.MetricsData.Builder
builder, Long responseTime) {
HttpProtocol http = metrics.getHttp();
List<String> aliasFields = metrics.getAliasFields();
String xpathExpression = http.getParseScript();
+
+ // Layer 1: Validate XPath expression is not empty
if (!StringUtils.hasText(xpathExpression)) {
log.warn("Http collect parse type is xmlPath, but the xpath
expression is empty.");
builder.setCode(CollectRep.Code.FAIL);
builder.setMsg("XPath expression is empty");
return;
}
+
+ // Layer 2: Check XML response size to prevent memory exhaustion
+ if (resp != null && resp.length() >
CollectorConstants.MAX_XML_RESPONSE_SIZE) {
+ log.warn("XML response size {} bytes exceeds maximum allowed size
{} bytes",
+ resp.length(), CollectorConstants.MAX_XML_RESPONSE_SIZE);
+ builder.setCode(CollectRep.Code.FAIL);
+ builder.setMsg("XML response exceeds maximum allowed size of "
+ + (CollectorConstants.MAX_XML_RESPONSE_SIZE / 1024 / 1024) +
"MB");
+ return;
Review Comment:
`MAX_XML_RESPONSE_SIZE` is documented as a byte limit, but here it's
compared against `resp.length()` and logged as "bytes", which is the number of
Java characters, not the original UTF-8 payload size. For XML with many
multibyte characters this can allow responses substantially larger than the
intended 10MB while still reporting them as within the byte limit. Consider
either measuring the actual byte length (e.g., via the original `HttpEntity` or
re-encoding with the correct charset) or updating the constant's Javadoc and
log/message text to make it clear this is a character-count limit instead of
bytes.
```suggestion
if (resp != null) {
int respSizeBytes = resp.getBytes(StandardCharsets.UTF_8).length;
if (respSizeBytes > CollectorConstants.MAX_XML_RESPONSE_SIZE) {
log.warn("XML response size {} bytes exceeds maximum allowed
size {} bytes",
respSizeBytes, CollectorConstants.MAX_XML_RESPONSE_SIZE);
builder.setCode(CollectRep.Code.FAIL);
builder.setMsg("XML response exceeds maximum allowed size of
"
+ (CollectorConstants.MAX_XML_RESPONSE_SIZE / 1024 /
1024) + "MB");
return;
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]