On Fri, Aug 10, 2012 at 4:27 PM, Niall Pemberton <niall.pember...@gmail.com>wrote:
> On Fri, Aug 10, 2012 at 6:44 PM, Gary Gregory <garydgreg...@gmail.com> > wrote: > > Hi All: > > > > Does anyone have expertise with BOMInputStream? > > > > I know that some XML parsers (like the one shipped with the Oracle JRE) > do > > not detect UTF-32 BOMs (UTF-8 and UTF-16 BOMs are OK) but using > > BOMInputStream is supposed to fix the issue. > > > > These tests I added and @Ignore'd fail: > > > > - > > > org.apache.commons.io.input.BOMInputStreamTest.testReadXmlWithBOMUtf32Be() > > - > > > org.apache.commons.io.input.BOMInputStreamTest.testReadXmlWithBOMUtf32Le() > > > > More basic tests do work: > > > > - > org.apache.commons.io.input.BOMInputStreamTest.testReadWithBOMUtf32Be() > > - > org.apache.commons.io.input.BOMInputStreamTest.testReadWithBOMUtf32Le() > > > > When I look at the Oracle JRE (which uses a copy of Xerces) I see code to > > deal with UCS-4, which is a precursor to UTF-32, like UCS-2 is a subset > to > > UTF-16, but as the test shows, Xerces fail parsing a UTF-32 document. > > > > Any thoughts? > > Hi Gary, > > I enabled the test and ran them. I'm a bit confused about what the > issue is because the lines that use the BOMInputStream to *skip* the > UTF-32 BOM do not fail for me: > > parseXml(new BOMInputStream(createUtf32BeDataStream(data, > true), ByteOrderMark.UTF_32BE)); > parseXml(new BOMInputStream(createUtf32LeDataStream(data, > true), ByteOrderMark.UTF_32LE)); > > whereas the lines after those that do not use any Commons IO components > fail: > > parseXml(createUtf32BeDataStream(data, true)); > parseXml(createUtf32LeDataStream(data, true)); > > So this just means that the XML parser doesn't deal with UTF-32 BOM. > > Really though the BOMInputStream stream doesn't provide anything that > helps parse the XML properly - it has two purposes 1) BOM detection > and 2) BOM removal/skipping. > > What we do have in Commons is XMLInputStream - this uses various > techniques to detect encoding, including using BOMInputStream to try > BOM detection and then uses that encoding to with a Reader to process > the bytes properly > Do you mean XmlStreamReader? Gary > > Niall > > > Thank you, > > Gary > > > > -- > > E-Mail: garydgreg...@gmail.com | ggreg...@apache.org > > JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0 > > Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK > > Blog: http://garygregory.wordpress.com > > Home: http://garygregory.com/ > > Tweet! http://twitter.com/GaryGregory > > --------------------------------------------------------------------- > To unsubscribe, e-mail: dev-unsubscr...@commons.apache.org > For additional commands, e-mail: dev-h...@commons.apache.org > > -- E-Mail: garydgreg...@gmail.com | ggreg...@apache.org JUnit in Action, 2nd Ed: <http://goog_1249600977>http://bit.ly/ECvg0 Spring Batch in Action: <http://s.apache.org/HOq>http://bit.ly/bqpbCK Blog: http://garygregory.wordpress.com Home: http://garygregory.com/ Tweet! http://twitter.com/GaryGregory