[
https://issues.apache.org/jira/browse/CAUSEWAY-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
PJ Fanning updated CAUSEWAY-3835:
---------------------------------
Description:
https://github.com/apache/causeway/blob/982de018229db2a097080ade53ccfbb4cceffd12/commons/src/main/java/org/apache/causeway/commons/internal/codec/_DocumentFactories.java
1. In `public Document parseDocument(final @Nullable String xml)`, you can
avoid the getBytes call that wastes memory and that could be an incorrect
assumption about the char encoding - not all XML originates as UTF-8 and if you
already have it in String format, you don't need to convert it back to bytes
(forcing the XML parser to turn into back into chars).
```
try(var sw = new StringWriter(xml)) {
var doc = documentBuilder.parse(new InputSource(sw));
return doc;
}
```
was:
https://github.com/apache/causeway/blob/982de018229db2a097080ade53ccfbb4cceffd12/commons/src/main/java/org/apache/causeway/commons/internal/codec/_DocumentFactories.java
1. In `public Document parseDocument(final @Nullable String xml)`, you can
avoid the getBytes call that wastes memory and that could be an incorrect
assumption about the char encoding - not all XML originates as UTF-8 and if you
already have it in String format, you don't need to convert it back to bytes
(forcing the XML parser to turn into back into chars).
```
try(var sw = new StringWriter(xml)) {
var doc = documentBuilder.parse(new InputSource(sw));
return doc;
}
```
2. TransformerFactory is susceptible to XML attacks
https://cheatsheetseries.owasp.org/cheatsheets/XML_External_Entity_Prevention_Cheat_Sheet.html
That page suggests setting:
```
TransformerFactory tf = TransformerFactory.newInstance();
tf.setAttribute(XMLConstants.ACCESS_EXTERNAL_DTD, "");
tf.setAttribute(XMLConstants.ACCESS_EXTERNAL_STYLESHEET, "");
```
> suggested improvments to _DocumentFactories.java
> ------------------------------------------------
>
> Key: CAUSEWAY-3835
> URL: https://issues.apache.org/jira/browse/CAUSEWAY-3835
> Project: Causeway
> Issue Type: Task
> Components: Tooling
> Reporter: PJ Fanning
> Assignee: Andi Huber
> Priority: Major
>
> https://github.com/apache/causeway/blob/982de018229db2a097080ade53ccfbb4cceffd12/commons/src/main/java/org/apache/causeway/commons/internal/codec/_DocumentFactories.java
> 1. In `public Document parseDocument(final @Nullable String xml)`, you can
> avoid the getBytes call that wastes memory and that could be an incorrect
> assumption about the char encoding - not all XML originates as UTF-8 and if
> you already have it in String format, you don't need to convert it back to
> bytes (forcing the XML parser to turn into back into chars).
> ```
> try(var sw = new StringWriter(xml)) {
> var doc = documentBuilder.parse(new InputSource(sw));
> return doc;
> }
> ```
--
This message was sent by Atlassian Jira
(v8.20.10#820010)