zhoujinsong opened a new pull request, #4119:
URL: https://github.com/apache/amoro/pull/4119
## What changes were proposed in this pull request?
When users upload XML configuration files (e.g. `core-site.xml`,
`hdfs-site.xml`) via the AMS dashboard, the uploaded bytes are parsed by
`Hadoop Configuration.addResource()`. Although the current classpath includes
Woodstox (which does not expand external entities by default), this implicit
protection is fragile — it can silently break if the dependency is excluded due
to a version conflict in the future.
This patch adds explicit XXE protection at the Amoro layer before delegating
to Hadoop `Configuration`, ensuring the security guarantee holds regardless of
the underlying XML parser implementation on the classpath.
## Why are the changes needed?
Without explicit protection, a malicious user could upload a crafted XML
file containing an external entity reference like:
```xml
<?xml version="1.0"?>
<!DOCTYPE foo [
<!ENTITY xxe SYSTEM "file:///etc/passwd">
]>
<configuration>&xxe;</configuration>
```
If the XXE implicit protection were ever lost (e.g. Woodstox excluded), this
could allow:
- Arbitrary local file read from the AMS server
- SSRF (Server-Side Request Forgery) via external URLs in entity references
## How was this patch tested?
Manual testing: uploaded a well-formed XML file (accepted) and an XML file
with an external entity reference (rejected with error response).
## Does this PR introduce _any_ user-facing change?
No. Legitimate Hadoop XML configuration files (`core-site.xml`,
`hdfs-site.xml`, `hive-site.xml`) do not use external entities. Valid files
continue to upload successfully.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]