I should have named this email thread " XMLEntityScanner/XMLEntityManager corrupting data" - sorry for the confusion. My debugger points me to these two classes, even though I have yet been unable to pinpoint the bug in the code
The repro case I've sent is fairly self-explanatory - the "b" from "because" ends up overwriting the "r" of "rugs". I used a custom implementation of InputStream in the repro case I sent (to keep it small), but I have seen the bug happen with other implementations of InputStream and much larger XML files. The corruption happens silently, which makes that bug pretty tricky to detect Thanks, Victor Michel Amazon Web Services -----Original Message----- From: Michel, Victor Sent: Monday, November 03, 2014 10:52 AM To: [email protected] Subject: RE: XMLStreamReader corrupting data Hi, Thanks for the answer XMLStreamReader is implemented by: > public class XMLStreamReaderImpl implements > javax.xml.stream.XMLStreamReader (in package > com.sun.org.apache.xerces.internal.impl ) It relies heavily on XMLEntityScanner, which is, I believe, the cause of the bug. XMLEntityScanner seems to be part of the Xerces library https://xerces.apache.org/xerces2-j/javadocs/xerces2/org/apache/xerces/impl/XMLEntityScanner.html Maybe I misunderstood something? In any case, I have filed a bug on the Oracle website Thanks, Victor -----Original Message----- From: Michael Glavassevich [mailto:[email protected]] Sent: Monday, November 03, 2014 8:30 AM To: [email protected] Subject: Re: XMLStreamReader corrupting data Hello, Apache Xerces does not contain an implementation of the XMLStreamReader interface. The component you're using would have been developed by Oracle/Sun and has not been contributed to Apache. We wouldn't know anything about the problem you're experiencing with StAX. Probably better to ask your question on one of the JDK forums. Thanks. Michael Glavassevich XML Technologies and WAS Development IBM Toronto Lab E-mail: [email protected] E-mail: [email protected] "Michel, Victor" <[email protected]> wrote on 11/03/2014 03:12:20 AM: > Hi all, > > I'd like to report something that looks like a bug in the version of > Xerces included in JRE 7u71/7u72/8u20/8u25 The StAX API seems to > produce corrupted data, depending on how many bytes the underlying > InputStream is actually reading at each invocation of read(byte[], > int, int) > > The following repro case will lead to different results depending on > the version of the JRE. Am I doing something wrong? > > Thanks, > > Victor > > ------ > > import java.io.ByteArrayInputStream; > import java.io.FilterInputStream; > import java.io.IOException; > import java.io.InputStream; > import java.nio.charset.Charset; > > import javax.xml.stream.XMLInputFactory; import > javax.xml.stream.XMLStreamReader; > > /* > * Correct output (7u67,8u11) > * rugs > * > * Incorrect output (7u71,7u72,8u20,8u25) > * bugs > */ > public class XmlReaderBug { > > private static final int BYTES_PER_READ = 6; > > private static final String XML = > "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" + > "<He likes=\"rugs\" because=\"they really tie the room together\"/>"; > > public static void main(String[] args) throws Exception { > final InputStream xmlStream = new ByteArrayInputStream > (XML.getBytes(Charset.forName("UTF-8"))); > final InputStream throttledXmlStream = new > ThrottledInputStream(xmlStream, BYTES_PER_READ); > > final XMLInputFactory xmlFactory = XMLInputFactory.newInstance(); > final XMLStreamReader xmlStreamReader = > xmlFactory.createXMLStreamReader(throttledXmlStream); > xmlStreamReader.next(); > > // bugs or rugs? > System.out.println(xmlStreamReader.getAttributeValue(null, "likes")); > } > > // An InputStream implementation that limits the number of bytes > read by read(byte[], int, int) > private static class ThrottledInputStream extends > FilterInputStream { > private final int bytesPerRead; > > public ThrottledInputStream(InputStream stream, int > bytesPerRead) throws Exception { > super(stream); > this.bytesPerRead = bytesPerRead; > } > > @Override > public int read(byte[] b, int off, int len) throws IOException { > if (off < 0 || len < 0 || len > b.length - off) { > throw new IndexOutOfBoundsException(); > } else if (len == 0) { > return 0; > } > > // Limit bytes read > int bytesToRead = Math.min(bytesPerRead, len); > > // Ensure deterministic behavior (similar to > org.apache.commons.io.IOUtils.read) > // Useless for this test case, but convenient for > consistently reproducing > // the bug with other stream implementations > int totalBytesRead = 0; > int bytesRead = 0; > do { > bytesRead = Math.max(0, in.read(b, off + > totalBytesRead, bytesToRead)); > bytesToRead -= bytesRead; > totalBytesRead += bytesRead; > } while (bytesRead > 0); > > // No more bytes > if (totalBytesRead == 0) { > return -1; > } > > return totalBytesRead; > } > } > } > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
