Hi Sherman, Lance,
Thanks for reviews.
It appears resetting InputStream is not reliable since not every
InputStream will support reset. I've modified the code. For other
changes, pls see inline comments.
On 5/22/2014 10:25 AM, Xueming Shen wrote:
Hi
(1) Do we really need those shift at line ln#2989/90 and 2994/95? it
appears to me
those bytes have been decided to be ZERO already, we are talking
about
mChar[0] = '<' and mChar[1] = '?' here, right?
Fixed. No need indeed.
(2) for test, maybe we should just do p.loadFromXML(in) ? that path
should verify the
fix as well (the real use scenario), right?
I've removed the test and updated LoadAndStoreXM instead, as Alan
suggested, to cover UTF-16BE/LE.
(3) do we have tests for utf16 bom? if not, I would suggest to throw
in UTF-16BE/LE-BOM
into the charset[], just in case.
java.nio.charset states that it writes BOM when encoding in UTF-16, but
not for BE or LE. That is why the tests behaved differently, that is,
detecting BOM in the case of UTF-16, but not for UTF-16BE/LE.
I added tests to manually append BOM in the case of UTF-16BE/LE to
verify that the code is capable of handling these cases (although
normally they won't come with BOM).
http://cr.openjdk.java.net/~joehw/jdk9/8043592/webrev/
Thanks,
Joe
thanks!
-Sherman
On 05/22/2014 09:30 AM, huizhe wang wrote:
Refer to 8042889, while verifying/testing 8042889, we noticed that
the tiny XML parser failed on UTF-16BE or LE. The cause of the
failure was that the parser was actually implemented to abide by the
XML specification that required entities encoded in UTF-16 to begin
with BOM. The test we used sent a byte array to the parser without
BOM, thus failed.
Since it's not uncommon for a XML to not have BOM, I borrowed the
technique used in Xerces to add an additional check for UTF-16
encoding. Please review.
http://cr.openjdk.java.net/~joehw/jdk9/8043592/webrev/
Thanks,
Joe