On Fri, Aug 10, 2012 at 6:44 PM, Gary Gregory <[email protected]> wrote:
> Hi All:
>
> Does anyone have expertise with BOMInputStream?
>
> I know that some XML parsers (like the one shipped with the Oracle JRE) do
> not detect UTF-32 BOMs (UTF-8 and UTF-16 BOMs are OK) but using
> BOMInputStream is supposed to fix the issue.
>
> These tests I added and @Ignore'd fail:
>
> -
> org.apache.commons.io.input.BOMInputStreamTest.testReadXmlWithBOMUtf32Be()
> -
> org.apache.commons.io.input.BOMInputStreamTest.testReadXmlWithBOMUtf32Le()
>
> More basic tests do work:
>
> - org.apache.commons.io.input.BOMInputStreamTest.testReadWithBOMUtf32Be()
> - org.apache.commons.io.input.BOMInputStreamTest.testReadWithBOMUtf32Le()
>
> When I look at the Oracle JRE (which uses a copy of Xerces) I see code to
> deal with UCS-4, which is a precursor to UTF-32, like UCS-2 is a subset to
> UTF-16, but as the test shows, Xerces fail parsing a UTF-32 document.
>
> Any thoughts?
Hi Gary,
I enabled the test and ran them. I'm a bit confused about what the
issue is because the lines that use the BOMInputStream to *skip* the
UTF-32 BOM do not fail for me:
parseXml(new BOMInputStream(createUtf32BeDataStream(data,
true), ByteOrderMark.UTF_32BE));
parseXml(new BOMInputStream(createUtf32LeDataStream(data,
true), ByteOrderMark.UTF_32LE));
whereas the lines after those that do not use any Commons IO components fail:
parseXml(createUtf32BeDataStream(data, true));
parseXml(createUtf32LeDataStream(data, true));
So this just means that the XML parser doesn't deal with UTF-32 BOM.
Really though the BOMInputStream stream doesn't provide anything that
helps parse the XML properly - it has two purposes 1) BOM detection
and 2) BOM removal/skipping.
What we do have in Commons is XMLInputStream - this uses various
techniques to detect encoding, including using BOMInputStream to try
BOM detection and then uses that encoding to with a Reader to process
the bytes properly
Niall
> Thank you,
> Gary
>
> --
> E-Mail: [email protected] | [email protected]
> JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0
> Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK
> Blog: http://garygregory.wordpress.com
> Home: http://garygregory.com/
> Tweet! http://twitter.com/GaryGregory
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]