Re: RFR: 8043592: The basic XML parser based on UKit fails to read XML files encoded in UTF-16BE or LE

huizhe wang Thu, 22 May 2014 22:16:24 -0700

Hi Sherman, Lance,

Thanks for reviews.

It appears resetting InputStream is not reliable since not everyInputStream will support reset. I've modified the code. For otherchanges, pls see inline comments.


On 5/22/2014 10:25 AM, Xueming Shen wrote:

Hi
(1) Do we really need those shift at line ln#2989/90 and 2994/95? itappears to methose bytes have been decided to be ZERO already, we are talkingabout
     mChar[0] = '<' and mChar[1] = '?' here, right?


Fixed. No need indeed.

(2) for test, maybe we should just do p.loadFromXML(in) ? that pathshould verify the
     fix as well (the real use scenario), right?

I've removed the test and updated LoadAndStoreXM instead, as Alansuggested, to cover UTF-16BE/LE.

(3) do we have tests for utf16 bom? if not, I would suggest to throwin UTF-16BE/LE-BOM
     into the charset[], just in case.

java.nio.charset states that it writes BOM when encoding in UTF-16, butnot for BE or LE. That is why the tests behaved differently, that is,detecting BOM in the case of UTF-16, but not for UTF-16BE/LE.

I added tests to manually append BOM in the case of UTF-16BE/LE toverify that the code is capable of handling these cases (althoughnormally they won't come with BOM).


http://cr.openjdk.java.net/~joehw/jdk9/8043592/webrev/
Thanks,
Joe

thanks!
-Sherman

On 05/22/2014 09:30 AM, huizhe wang wrote:
Refer to 8042889, while verifying/testing 8042889, we noticed thatthe tiny XML parser failed on UTF-16BE or LE. The cause of thefailure was that the parser was actually implemented to abide by theXML specification that required entities encoded in UTF-16 to beginwith BOM. The test we used sent a byte array to the parser withoutBOM, thus failed.
Since it's not uncommon for a XML to not have BOM, I borrowed thetechnique used in Xerces to add an additional check for UTF-16encoding. Please review.
http://cr.openjdk.java.net/~joehw/jdk9/8043592/webrev/

Thanks,
Joe

Re: RFR: 8043592: The basic XML parser based on UKit fails to read XML files encoded in UTF-16BE or LE

Reply via email to