For Xerces 2.9.1, did you add Xerces to your runtime through the Java endorsed mechanism [1]?
Gary [1] http://java.sun.com/j2se/1.4.2/docs/guide/standards/ > -----Original Message----- > From: Thomas Schleu [mailto:tsch...@canto.com] > Sent: Friday, January 22, 2010 05:29 > To: j-users@xerces.apache.org > Subject: Parser passes garbage to characters() callback for XML > containing character entities > > I can reproduce a problem parsing certain XML 1.1 files that contain > lots of > character entities (escaped control chars like ""). > At some point in the file the parser calls my characters() method with > garbage text. > > Here is the source code that generates such an XML file: > > FileOutputStream fos = new FileOutputStream (new File > ("C:/test.xml")); > fos.write ("<?xml version=\"1.1\" encoding=\"UTF-8\"?>\n<!DOCTYPE > X>\n<ns:X xmlns:ns=\"http://www.mycompany.com/ns/X/1.0\">\n".getBytes > ("UTF-8")); > final byte[] bytes = > ("<ns:item>abcdefghijklmnopqrstuvwxyz</ns:item>\n").getBytes > ("UTF-8"); > for (int i = 0; i < 314; i++) > { > fos.write(bytes); > } > fos.write ("</ns:X>".getBytes ("UTF-8")); > fos.close (); > > The XML is very simple, it just contains lots of identical elements > with > "" in the body text. > The parsing code looks like the following: > > FileInputStream fis = new FileInputStream (new File > ("C:/test.xml")); > final SAXParserFactory saxParserFactory = > SAXParserFactory.newInstance > (); > saxParserFactory.setFeature > ("http://xml.org/sax/features/namespaces", > Boolean.TRUE); > saxParserFactory.setFeature > ("http://xml.org/sax/features/namespace-prefixes", Boolean.TRUE); > final SAXParser parser = saxParserFactory.newSAXParser (); > try > { > parser.parse (fis, new DefaultHandler() > { > StringBuilder sb = new StringBuilder (); > String currentElement = null; > > public void startElement (String uri, String localName, > String > qName, Attributes attributes) throws SAXException > { > currentElement = localName; > } > public void characters (char ch[], int start, int length) > throws > SAXException > { > if ("item".equals (currentElement)) > { > String s = new String (ch, start, length); > if (sb.length () == 0 && !s.startsWith ("abc")) > { > // THE PARSER CALLS ME WITH GARBAGE! > System.out.println ("ERROR"); > } > sb.append (s); > } > } > public void endElement (String uri, String localName, > String > qName) throws SAXException > { > if ("item".equals (localName)) > { > sb.delete (0, sb.length ()); > currentElement = null; > } > } > }); > } > catch (Exception e) > { > e.printStackTrace (); > System.out.println ("e = " + e); > } > > My characters() method checks whether the body text is the expected > text > starting with "abc". > After 156 elements with the correct body text my method gets called > with the > text "x19;<fghijklmnopqrstuvwxyz" as the starting body text of the > element. > The XML code has to exceed 16kB to show this problem. It may be related > to > the 8kB internal buffer of the parser. > > I tested with the parser shipping with jdk1.5.0_19, and jdk1.6.0_14 as > well > as with the separate xerces-2_9_1. All show the same behavior. > I cannot work around this as I don't have control over the XML input. > > Anyone who can help me here? > > Thanks in Advance > Thomas Schleu > Chief Technology Officer > > Mail: mailto:tsch...@canto.com > Fon: +49-30-390 485 0 > Fax: +49-30-390 485 55 > > Canto GmbH > Alt-Moabit 73 > D-10555 Berlin > Germany > http://www.canto.com > Amtsgericht Berlin-Charlottenburg HRB 88566 > Geschäftsführer: Hans-Dieter Schädel > > > > > --------------------------------------------------------------------- > To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org > For additional commands, e-mail: j-users-h...@xerces.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org For additional commands, e-mail: j-users-h...@xerces.apache.org