[ 
https://issues.apache.org/jira/browse/XERCESJ-1257?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14187272#comment-14187272
 ] 

Martin Buchholz commented on XERCESJ-1257:
------------------------------------------

This bug was noticed affecting Java programs at Google.

We are planning to use Robert's patch, applied against the internal version of 
Xerces used in the JDK.

Here's a test program demonstrating the bug in the JDK:

import java.io.ByteArrayInputStream;
import java.io.IOException;
import javax.xml.parsers.ParserConfigurationException;
import javax.xml.parsers.SAXParser;
import javax.xml.parsers.SAXParserFactory;

import org.xml.sax.InputSource;
import org.xml.sax.SAXException;
import org.xml.sax.helpers.DefaultHandler;

public class UTF8ReaderBug {
    public static void main(String[] args) throws Throwable {
        StringBuilder b = new StringBuilder("<xml>");
        for(int i = 5; i < 8223; i++) {
            b.append(' ');
        }
        // Add surrogate characters which overflow the buffer. This shows the 
need to place an
        // overflow check at --
        // 
com.sun.org.apache.xerces.internal.impl.io.UTF8Reader.read(UTF8Reader.java:544)
        b.append("\uD835\uDC37");
        b.append("</xml>");
        sendToParser(b.toString());
    }

    private static void sendToParser(String b) throws Throwable {
        byte[] input = b.getBytes("UTF-8");
        ByteArrayInputStream in = new ByteArrayInputStream(input);

        SAXParserFactory  spf = SAXParserFactory.newInstance();
        SAXParser p = spf.newSAXParser();
        p.parse(new InputSource(in), new DefaultHandler());
    }
}



> buffer overflow in UTF8Reader for characters out of BMP
> -------------------------------------------------------
>
>                 Key: XERCESJ-1257
>                 URL: https://issues.apache.org/jira/browse/XERCESJ-1257
>             Project: Xerces2-J
>          Issue Type: Bug
>          Components: JAXP (javax.xml.parsers)
>    Affects Versions: 2.9.0
>         Environment: Any
>            Reporter: Robert Stojnic
>            Assignee: Michael Glavassevich
>            Priority: Minor
>         Attachments: TestXerces.java, UTF8Reader.patch, 
> XERCESJ-1257_tests.patch
>
>
> There is a ArrayOutOfBoundsException in org.apache.xerces.impl.io.UTF8Reader, 
> in read(char[],int,int) for 4-byte utf-8 chars.
> Imagine a following scenario. read() has a buffer of size N, and it reads N-1 
> ascii chars, and stores it in the output buffer. Let the Nth char be the 
> first byte of a 4 byte utf-8 char. The other 3 bytes are fetched by invoking 
> read() on the input stream. From these a surrogate pair of java chars is 
> made, however, method does not check if both chars can fit into the output 
> buffer ... In most cases, they would fit into the ouput buffer (e.g. if there 
> are some other multi-byte chars in the fetched text), so the bug is very 
> rare, but it still happens.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to