[ https://issues.apache.org/jira/browse/BEAM-2060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Luke Cwik resolved BEAM-2060. ----------------------------- Resolution: Fixed Fix Version/s: First stable release Note that the charset support is limited to encodings which use single byte encodings when reading. > XmlIO use harcoded Charset > -------------------------- > > Key: BEAM-2060 > URL: https://issues.apache.org/jira/browse/BEAM-2060 > Project: Beam > Issue Type: Improvement > Components: sdk-java-core > Affects Versions: 0.6.0 > Reporter: Damien GOUYETTE > Assignee: Jean-Baptiste Onofré > Fix For: First stable release > > > When i use a file encoded with ISO-8859-1 with a caracter *é* i got an > exception like : > {code} > Caused by: java.io.CharConversionException: Invalid UTF-8 middle byte 0x64 > (at char #1061, byte #1012) > at com.ctc.wstx.io.UTF8Reader.reportInvalidOther(UTF8Reader.java:314) > at com.ctc.wstx.io.UTF8Reader.read(UTF8Reader.java:205) > at com.ctc.wstx.io.MergedReader.read(MergedReader.java:105) > at com.ctc.wstx.io.ReaderSource.readInto(ReaderSource.java:86) > at > com.ctc.wstx.io.BranchingReaderSource.readInto(BranchingReaderSource.java:56) > at com.ctc.wstx.sr.StreamScanner.loadMore(StreamScanner.java:1001) > ... 19 more > {code} > Encoding is hardcoded : > https://github.com/apache/beam/blob/master/sdks/java/io/xml/src/main/java/org/apache/beam/sdk/io/xml/XmlSource.java#L190 > https://github.com/apache/beam/blob/master/sdks/java/io/xml/src/main/java/org/apache/beam/sdk/io/xml/XmlSource.java#L238 > https://github.com/apache/beam/blob/master/sdks/java/io/xml/src/main/java/org/apache/beam/sdk/io/xml/XmlSource.java#L342 > > It would be great if i can specify it like : > {code} > XmlSource.from[MyClass](input) > .withRootElement("ROOT_ELEMENT") > .withRecordElement("MyClass") > .withRecordClass(classOf[MyClass]) > .withCharset(StandardCharsets.ISO_8859_1) > {code} > I can provide a pull request if you want -- This message was sent by Atlassian JIRA (v6.3.15#6346)