For Xerces 2.9.1, did you add Xerces to your runtime through the Java endorsed 
mechanism [1]?

Gary

[1] http://java.sun.com/j2se/1.4.2/docs/guide/standards/
 

> -----Original Message-----
> From: Thomas Schleu [mailto:tsch...@canto.com]
> Sent: Friday, January 22, 2010 05:29
> To: j-users@xerces.apache.org
> Subject: Parser passes garbage to characters() callback for XML
> containing character entities
> 
> I can reproduce a problem parsing certain XML 1.1 files that contain
> lots of
> character entities (escaped control chars like "").
> At some point in the file the parser calls my characters() method with
> garbage text.
> 
> Here is the source code that generates such an XML file:
> 
>     FileOutputStream fos = new FileOutputStream (new File
> ("C:/test.xml"));
>     fos.write ("<?xml version=\"1.1\" encoding=\"UTF-8\"?>\n<!DOCTYPE
> X>\n<ns:X xmlns:ns=\"http://www.mycompany.com/ns/X/1.0\";>\n".getBytes
> ("UTF-8"));
>     final byte[] bytes =
> ("<ns:item>abcdefghijklmnopqrstuvwxyz&#x19;</ns:item>\n").getBytes
> ("UTF-8");
>     for (int i = 0; i < 314; i++)
>     {
>         fos.write(bytes);
>     }
>     fos.write ("</ns:X>".getBytes ("UTF-8"));
>     fos.close ();
> 
> The XML is very simple, it just  contains lots of identical elements
> with
> "&#x19;" in the body text.
> The parsing code looks like the following:
> 
>     FileInputStream fis = new FileInputStream (new File
> ("C:/test.xml"));
>     final SAXParserFactory saxParserFactory =
> SAXParserFactory.newInstance
> ();
>     saxParserFactory.setFeature
> ("http://xml.org/sax/features/namespaces";,
> Boolean.TRUE);
>     saxParserFactory.setFeature
> ("http://xml.org/sax/features/namespace-prefixes";, Boolean.TRUE);
>     final SAXParser parser = saxParserFactory.newSAXParser ();
>     try
>     {
>         parser.parse (fis, new DefaultHandler()
>         {
>             StringBuilder sb = new StringBuilder ();
>             String currentElement = null;
> 
>             public void startElement (String uri, String localName,
> String
> qName, Attributes attributes) throws SAXException
>             {
>                 currentElement = localName;
>             }
>             public void characters (char ch[], int start, int length)
> throws
> SAXException
>             {
>                 if ("item".equals (currentElement))
>                 {
>                     String s = new String (ch, start, length);
>                     if (sb.length () == 0 && !s.startsWith ("abc"))
>                     {
>                         // THE PARSER CALLS ME WITH GARBAGE!
>                         System.out.println ("ERROR");
>                     }
>                     sb.append (s);
>                 }
>             }
>             public void endElement (String uri, String localName,
> String
> qName) throws SAXException
>             {
>                 if ("item".equals (localName))
>                 {
>                     sb.delete (0, sb.length ());
>                     currentElement = null;
>                 }
>             }
>         });
>     }
>     catch (Exception e)
>     {
>         e.printStackTrace ();
>         System.out.println ("e = " + e);
>     }
> 
> My characters() method checks whether the body text is the expected
> text
> starting with "abc".
> After 156 elements with the correct body text my method gets called
> with the
> text "x19;<fghijklmnopqrstuvwxyz" as the starting body text of the
> element.
> The XML code has to exceed 16kB to show this problem. It may be related
> to
> the 8kB internal buffer of the parser.
> 
> I tested with the parser shipping with jdk1.5.0_19, and jdk1.6.0_14 as
> well
> as with the separate xerces-2_9_1. All show the same behavior.
> I cannot work around this as I don't have control over the XML input.
> 
> Anyone who can help me here?
> 
> Thanks in Advance
> Thomas Schleu
> Chief Technology Officer
> 
> Mail: mailto:tsch...@canto.com
> Fon:  +49-30-390 485 0
> Fax:  +49-30-390 485 55
> 
> Canto GmbH
> Alt-Moabit 73
> D-10555 Berlin
> Germany
> http://www.canto.com
> Amtsgericht Berlin-Charlottenburg HRB 88566
> Geschäftsführer: Hans-Dieter Schädel
> 
> 
> 
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
> For additional commands, e-mail: j-users-h...@xerces.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: j-users-unsubscr...@xerces.apache.org
For additional commands, e-mail: j-users-h...@xerces.apache.org

Reply via email to