I should have named this email thread " XMLEntityScanner/XMLEntityManager 
corrupting data" - sorry for the confusion.
My debugger points me to these two classes, even though I have yet been unable 
to pinpoint the bug in the code

The repro case I've sent is fairly self-explanatory - the "b" from "because" 
ends up overwriting the "r" of "rugs".

I used a custom implementation of InputStream in the repro case I sent (to keep 
it small), but I have seen the bug happen with other implementations of 
InputStream and much larger XML files.
The corruption happens silently, which makes that bug pretty tricky to detect

Thanks,

Victor Michel
Amazon Web Services


-----Original Message-----
From: Michel, Victor 
Sent: Monday, November 03, 2014 10:52 AM
To: [email protected]
Subject: RE: XMLStreamReader corrupting data

Hi,

Thanks for the answer
XMLStreamReader is implemented by:
> public class XMLStreamReaderImpl implements 
> javax.xml.stream.XMLStreamReader  (in package 
> com.sun.org.apache.xerces.internal.impl )
It relies heavily on XMLEntityScanner, which is, I believe, the cause of the 
bug.

XMLEntityScanner seems to be part of the Xerces library 
https://xerces.apache.org/xerces2-j/javadocs/xerces2/org/apache/xerces/impl/XMLEntityScanner.html

Maybe I misunderstood something? In any case, I have filed a bug on the Oracle 
website

Thanks,

Victor

-----Original Message-----
From: Michael Glavassevich [mailto:[email protected]]
Sent: Monday, November 03, 2014 8:30 AM
To: [email protected]
Subject: Re: XMLStreamReader corrupting data

Hello,

Apache Xerces does not contain an implementation of the XMLStreamReader 
interface. The component you're using would have been developed by Oracle/Sun 
and has not been contributed to Apache. We wouldn't know anything about the 
problem you're experiencing with StAX. Probably better to ask your question on 
one of the JDK forums.

Thanks.

Michael Glavassevich
XML Technologies and WAS Development
IBM Toronto Lab
E-mail: [email protected]
E-mail: [email protected]

"Michel, Victor" <[email protected]> wrote on 11/03/2014 03:12:20 AM:

> Hi all,
> 
> I'd like to report something that looks like a bug in the version of 
> Xerces included in JRE 7u71/7u72/8u20/8u25 The StAX API seems to 
> produce corrupted data, depending on how many bytes the underlying 
> InputStream is actually reading at each invocation of read(byte[], 
> int, int)
> 
> The following repro case will lead to different results depending on 
> the version of the JRE. Am I doing something wrong?
> 
> Thanks,
> 
> Victor
> 
> ------
> 
> import java.io.ByteArrayInputStream;
> import java.io.FilterInputStream;
> import java.io.IOException;
> import java.io.InputStream;
> import java.nio.charset.Charset;
> 
> import javax.xml.stream.XMLInputFactory; import 
> javax.xml.stream.XMLStreamReader;
> 
> /*
>  * Correct output (7u67,8u11)
>  * rugs
>  *
>  * Incorrect output (7u71,7u72,8u20,8u25)
>  * bugs
>  */
> public class XmlReaderBug {
> 
>     private static final int BYTES_PER_READ = 6;
> 
>     private static final String XML =
>         "<?xml version=\"1.0\" encoding=\"UTF-8\"?>" +
>         "<He likes=\"rugs\" because=\"they really tie the room
together\"/>";
> 
>     public static void main(String[] args) throws Exception {
>         final InputStream xmlStream = new ByteArrayInputStream 
> (XML.getBytes(Charset.forName("UTF-8")));
>         final InputStream throttledXmlStream = new 
> ThrottledInputStream(xmlStream, BYTES_PER_READ);
> 
>         final XMLInputFactory xmlFactory =
XMLInputFactory.newInstance();
>         final XMLStreamReader xmlStreamReader = 
> xmlFactory.createXMLStreamReader(throttledXmlStream);
>         xmlStreamReader.next();
> 
>         // bugs or rugs?
>         System.out.println(xmlStreamReader.getAttributeValue(null,
"likes"));
>     }
> 
>     // An InputStream implementation that limits the number of bytes 
> read by read(byte[], int, int)
>     private static class ThrottledInputStream extends 
> FilterInputStream
{
>         private final int bytesPerRead;
> 
>         public ThrottledInputStream(InputStream stream, int
> bytesPerRead) throws Exception {
>             super(stream);
>             this.bytesPerRead = bytesPerRead;
>         }
> 
>         @Override
>         public int read(byte[] b, int off, int len) throws IOException {
>             if (off < 0 || len < 0 || len > b.length - off) {
>                 throw new IndexOutOfBoundsException();
>             } else if (len == 0) {
>                 return 0;
>             }
> 
>             // Limit bytes read
>             int bytesToRead = Math.min(bytesPerRead, len);
> 
>             // Ensure deterministic behavior (similar to
> org.apache.commons.io.IOUtils.read)
>             // Useless for this test case, but convenient for 
> consistently reproducing
>             // the bug with other stream implementations
>             int totalBytesRead = 0;
>             int bytesRead = 0;
>             do {
>                 bytesRead = Math.max(0, in.read(b, off + 
> totalBytesRead, bytesToRead));
>                 bytesToRead -= bytesRead;
>                 totalBytesRead += bytesRead;
>             } while (bytesRead > 0);
> 
>             // No more bytes
>             if (totalBytesRead == 0) {
>                 return -1;
>             }
> 
>             return totalBytesRead;
>         }
>     }
> }
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [email protected]
> For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to