RonnyRen opened a new issue, #5222: URL: https://github.com/apache/hop/issues/5222
### Apache Hop version? 2.12 ### Java version? 18 ### Operating system Windows ### What happened? I used transform "Get data from XML" to process a file that is Windows-1252 encoding and there is a special character in it, an error happened as below no matter what encoding I used unless I specified encoding in the XML file. (No encoding info in the XML decoration) Error: org.dom4j.DocumentException: Error on line 13 of document file:///C:/workspace/hop/windows-1252 : Invalid byte 1 of 1-byte UTF-8 sequence. I viewed the source code and I think that I found the root cause. As the link below, it seems that it uses read function of SAXReader incorrectly. https://github.com/apache/hop/blob/98f86412756517e74ef1fcd5552b62a18d898e4a/plugins/transforms/xml/src/main/java/org/apache/hop/pipeline/transforms/xml/getxmldata/GetXmlData.java#L204 As document said, the second parameter is systemId not encoding.  It should use function setEncoding to specify encoding of input source before calling read function.  Please feel free to correct me if something wrong. Note: XML input stream (Stax) is working with specified encoding. ### Issue Priority Priority: 2 ### Issue Component Component: Transforms -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
