[ https://issues.apache.org/jira/browse/MINVOKER-351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17847586#comment-17847586 ]
ASF GitHub Bot commented on MINVOKER-351: ----------------------------------------- michael-o commented on PR #242: URL: https://github.com/apache/maven-invoker-plugin/pull/242#issuecomment-2118973031 @elharo is right. This is how it should look like: ```java public static void main(String[] args) throws ParserConfigurationException, TransformerException { Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().newDocument(); Element root = doc.createElement("root"); for (int i = 0; i < Byte.MAX_VALUE; i++) { Element elem = doc.createElement("char"); elem.setTextContent(Character.getName(i) + ": " + ((char) i)); root.appendChild(elem); } doc.appendChild(root); DOMSource domSource = new DOMSource(doc); StreamResult result = new StreamResult(System.out); TransformerFactory tf = TransformerFactory.newInstance(); Transformer transformer = tf.newTransformer(); transformer.setOutputProperty(OutputKeys.INDENT, "yes"); transformer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "2"); transformer.transform(domSource, result); } ``` output: ``` SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. <?xml version="1.0" encoding="UTF-8" standalone="no"?> <root> <char>NULL: �</char> <char>START OF HEADING: </char> <char>START OF TEXT: </char> <char>END OF TEXT: </char> <char>END OF TRANSMISSION: </char> <char>ENQUIRY: </char> <char>ACKNOWLEDGE: </char> <char>BELL: </char> <char>BACKSPACE: </char> <char>CHARACTER TABULATION: </char> <char>LINE FEED (LF): </char> <char>LINE TABULATION: </char> <char>FORM FEED (FF): </char> <char>CARRIAGE RETURN (CR): </char> <char>SHIFT OUT: </char> <char>SHIFT IN: </char> <char>DATA LINK ESCAPE: </char> <char>DEVICE CONTROL ONE: </char> <char>DEVICE CONTROL TWO: </char> <char>DEVICE CONTROL THREE: </char> <char>DEVICE CONTROL FOUR: </char> <char>NEGATIVE ACKNOWLEDGE: </char> <char>SYNCHRONOUS IDLE: </char> <char>END OF TRANSMISSION BLOCK: </char> <char>CANCEL: </char> <char>END OF MEDIUM: </char> <char>SUBSTITUTE: </char> <char>ESCAPE: </char> <char>INFORMATION SEPARATOR FOUR: </char> <char>INFORMATION SEPARATOR THREE: </char> <char>INFORMATION SEPARATOR TWO: </char> <char>INFORMATION SEPARATOR ONE: </char> <char>SPACE: </char> <char>EXCLAMATION MARK: !</char> <char>QUOTATION MARK: "</char> <char>NUMBER SIGN: #</char> <char>DOLLAR SIGN: $</char> <char>PERCENT SIGN: %</char> <char>AMPERSAND: &</char> <char>APOSTROPHE: '</char> <char>LEFT PARENTHESIS: (</char> <char>RIGHT PARENTHESIS: )</char> <char>ASTERISK: *</char> <char>PLUS SIGN: +</char> <char>COMMA: ,</char> <char>HYPHEN-MINUS: -</char> <char>FULL STOP: .</char> <char>SOLIDUS: /</char> <char>DIGIT ZERO: 0</char> <char>DIGIT ONE: 1</char> <char>DIGIT TWO: 2</char> <char>DIGIT THREE: 3</char> <char>DIGIT FOUR: 4</char> <char>DIGIT FIVE: 5</char> <char>DIGIT SIX: 6</char> <char>DIGIT SEVEN: 7</char> <char>DIGIT EIGHT: 8</char> <char>DIGIT NINE: 9</char> <char>COLON: :</char> <char>SEMICOLON: ;</char> <char>LESS-THAN SIGN: <</char> <char>EQUALS SIGN: =</char> <char>GREATER-THAN SIGN: ></char> <char>QUESTION MARK: ?</char> <char>COMMERCIAL AT: @</char> <char>LATIN CAPITAL LETTER A: A</char> <char>LATIN CAPITAL LETTER B: B</char> <char>LATIN CAPITAL LETTER C: C</char> <char>LATIN CAPITAL LETTER D: D</char> <char>LATIN CAPITAL LETTER E: E</char> <char>LATIN CAPITAL LETTER F: F</char> <char>LATIN CAPITAL LETTER G: G</char> <char>LATIN CAPITAL LETTER H: H</char> <char>LATIN CAPITAL LETTER I: I</char> <char>LATIN CAPITAL LETTER J: J</char> <char>LATIN CAPITAL LETTER K: K</char> <char>LATIN CAPITAL LETTER L: L</char> <char>LATIN CAPITAL LETTER M: M</char> <char>LATIN CAPITAL LETTER N: N</char> <char>LATIN CAPITAL LETTER O: O</char> <char>LATIN CAPITAL LETTER P: P</char> <char>LATIN CAPITAL LETTER Q: Q</char> <char>LATIN CAPITAL LETTER R: R</char> <char>LATIN CAPITAL LETTER S: S</char> <char>LATIN CAPITAL LETTER T: T</char> <char>LATIN CAPITAL LETTER U: U</char> <char>LATIN CAPITAL LETTER V: V</char> <char>LATIN CAPITAL LETTER W: W</char> <char>LATIN CAPITAL LETTER X: X</char> <char>LATIN CAPITAL LETTER Y: Y</char> <char>LATIN CAPITAL LETTER Z: Z</char> <char>LEFT SQUARE BRACKET: [</char> <char>REVERSE SOLIDUS: \</char> <char>RIGHT SQUARE BRACKET: ]</char> <char>CIRCUMFLEX ACCENT: ^</char> <char>LOW LINE: _</char> <char>GRAVE ACCENT: `</char> <char>LATIN SMALL LETTER A: a</char> <char>LATIN SMALL LETTER B: b</char> <char>LATIN SMALL LETTER C: c</char> <char>LATIN SMALL LETTER D: d</char> <char>LATIN SMALL LETTER E: e</char> <char>LATIN SMALL LETTER F: f</char> <char>LATIN SMALL LETTER G: g</char> <char>LATIN SMALL LETTER H: h</char> <char>LATIN SMALL LETTER I: i</char> <char>LATIN SMALL LETTER J: j</char> <char>LATIN SMALL LETTER K: k</char> <char>LATIN SMALL LETTER L: l</char> <char>LATIN SMALL LETTER M: m</char> <char>LATIN SMALL LETTER N: n</char> <char>LATIN SMALL LETTER O: o</char> <char>LATIN SMALL LETTER P: p</char> <char>LATIN SMALL LETTER Q: q</char> <char>LATIN SMALL LETTER R: r</char> <char>LATIN SMALL LETTER S: s</char> <char>LATIN SMALL LETTER T: t</char> <char>LATIN SMALL LETTER U: u</char> <char>LATIN SMALL LETTER V: v</char> <char>LATIN SMALL LETTER W: w</char> <char>LATIN SMALL LETTER X: x</char> <char>LATIN SMALL LETTER Y: y</char> <char>LATIN SMALL LETTER Z: z</char> <char>LEFT CURLY BRACKET: {</char> <char>VERTICAL LINE: |</char> <char>RIGHT CURLY BRACKET: }</char> <char>TILDE: ~</char> </root> ``` which it does not with the Plexus serializer. Means: Plexus serializer is broken. > Prevent XML-prohibited characters from entering JUnit report > ------------------------------------------------------------ > > Key: MINVOKER-351 > URL: https://issues.apache.org/jira/browse/MINVOKER-351 > Project: Maven Invoker Plugin > Issue Type: Bug > Reporter: Mikkel Kjeldsen > Assignee: Slawomir Jaranowski > Priority: Major > Fix For: 3.7.0 > > Attachments: minvoker-351.tar.gz > > > Neither the Maven Invoker plugin's implementation of {{<writeJunitReport>}} > nor the underlying XML infrastructure directly protect against the presence > of character literals prohibited by the XML specification, meaning such > literals can appear in the JUnit report and render it unreadable. *I would > appreciate if the Maven Invoker plugin could learn to strip prohibited > literals to protect its users from creative plugins.* I argue that this is a > safe and expected transformation that is not materially lossy. > ---- > h2. Background > MINVOKER-196 added the {{<writeJunitReport>}} option [back in > maven-invoker-plugin-3.2.1|https://github.com/apache/maven-invoker-plugin/blob/maven-invoker-plugin-3.2.1/src/main/java/org/apache/maven/plugins/invoker/AbstractInvokerMojo.java#L1878-L1946]. > As of [maven-invoker-plugin-3.6.0 the effective implementation of the JUnit > report remains effectively > unchanged|https://github.com/apache/maven-invoker-plugin/blob/maven-invoker-plugin-3.6.0/src/main/java/org/apache/maven/plugins/invoker/AbstractInvokerMojo.java#L1695-L1754]. > The JUnit report includes a {{<system-out>}} element ([example > documentation|https://github.com/testmoapp/junitxml]) whose value Maven > Invoker populates with the raw build log contents. I've observed that this > value is XML-escaped, which I imagine is well understood in the > implementation, although I can't immediately find documentation to support > that. > However, escaping notwithstanding, a number of character literals are > outright prohibited by the XML specifications. These literals cannot be > escaped, and their presence renders an XML document not well formed. The > exact set of prohibited characters varies by XML version; the report produced > by the Maven Invoker plugin is XML version 1.0. When the Maven Invoker plugin > reads in the build log it does not strip these character literals and neither > does the XML writer the Maven Invoker plugin relies on. Consequently, if a > build log ends up including a prohibited character the resulting JUnit report > will not be well formed. > The set of prohibited characters is the complement of [the XML > specification's definition of {{Char}}|https://www.w3.org/TR/xml/#NT-Char]. > h2. Example > Among the literals prohibited by XML version 1.0 is {{^H}} (backspace). When > [pitest runs via Maven|https://pitest.org/quickstart/maven/] it prints a > spinner to standard out, and the implementation uses backspace to render the > spinner in place. I have used the Maven Invoker plugin with > {{<writeJunitReport>}} to verify a pitest configuration, whereby I discovered > this limitation. > h2. Remediation > h3. Blame plugins > Perhaps pitest should not behave this way but we can't change pitest, and > even if pitest could be changed that offers no protection against any other > plugin, so blaming plugins is an ineffective course of action. > h3. Work-arounds > The user can manually clean the build log in-place via > {{<postBuildHookScript>}}. This is technically fairly easy to do, and makes > the transformation very explicit, but it requires considerable local work to > address an issue many would find obscure and the transformation is > permanently lossy unless the user also backs up the raw log to another file > name. > h3. Strip prohibited literals inside Maven Invoker plugin > If the Maven Invoker plugin learns to strip offending character literals > in-between reading the build log and writing to the {{<system-out>}} value > then {{<writeJunitReport>}} will Just Workâ˘, which I assert is what a user > will typically expect. Although the {{<system-out>}} value will no longer > exactly match the build log contents, this lossy translation is acceptable: > the prohibited characters are overwhelmingly unprintable to begin with and > therefore cannot be meaningfully rendered in a static context, and the raw > build log remains unchanged in the event that the user needs to investigate > or assert against the raw output. > This change would be backwards compatible, because any existing user that > would be affected by it would already have unparseable JUnit reports. > * I _believe_ that Java's {{j.u.r.Pattern}} can trivially express the > complement of allowed characters but there may exist more efficient solutions. > * Consider also applying this transformation to the 2 uses of > {{buildJob.getFailureMessage()}}. > h4. Replace prohibited literals inside Maven Invoker plugin > As a variation of stripping prohibited character literals, the Maven Invoker > plugin could substitute sentinel values for prohibited character literals. > This approach has the downside that it requires additional decision making > for determining suitable substitution(s) but is otherwise comparable. -- This message was sent by Atlassian Jira (v8.20.10#820010)