Hi Chris,

as far as I know, dom4j uses the default XML parser of your system to  
parse XML files - it doesn't come with its own parser. I've run into  
problems several times when I have had multiple XML parser jars on  
the classpath because it is a roulette which parser gets used. In  
your case you seem to be using Xerces, so I'd say the problem is not  
in dom4j, but in Xerces. The XML produced by dom4j seems perfectly  
correct to me.

Regards,
Richard

Am 10.01.2007 um 10:26 schrieb Chris Lai /EEL/IT:

> Richard,
>
> Whe you do the following,
>
>>         Document document = DocumentHelper.createDocument();
>>         Element root = document.addElement( "root" );
>>
>>         Element author1 = root.addElement( "author" )
>>             .addText( "James Strachan" + (new Character((char)
>> 8)).toString() );
>>
>>         String text = document.asXML();
>>         System.out.println(text);
>
> The result is
>
>       <?xml version="1.0" encoding="UTF-8"?>
>       <root><author>James Strachan&#8;</author></root>
>
> which may be correct, but the dom4j parser reject such xml it  
> generated.
>
> So it may be a bug either on
>
> 1) xml toString incorrectly encode backspace into &#8;
> or
> 2) The parser incorrectly reject &#8;
>
> if 2) is true, I need to know the work around to encode backspace  
> in order
> for dom4j to parse with exception.
>
> IE has no problem to parse the xml with &#8;
>
> Regards,
>
> Chris Lai
>
> 29597369
> GET 6303
>
>
> -----Original Message-----
> From: Richard Eckart [mailto:[EMAIL PROTECTED]
> Sent: Wednesday, January 10, 2007 4:29 PM
> To: Chris Lai /EEL/IT
> Cc: dom4j-user@lists.sourceforge.net
> Subject: Re: [dom4j-user] problem on parsing backspace character
>
>
> Hi Chris,
>
> It's due to the XML specifications. The backspace character is not a
> valid XML character. If you want to have it in your documents, you
> need to escape it. It seems there is a bug that causes it to fail to
> escape the backspace char when a XML document is serialized to
> String. What I suppose should happen is, that the resultung XML
> contains a &#8; entity.
>
> See here: http://www.w3.org/TR/REC-xml/#charsets
> (Section 2.2 Characters - Character range)
> Character Range
>
> Char
>     ::=
> #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
> /* any Unicode character, excluding the surrogate blocks, FFFE, and
> FFFF. */
>
> Cheers,
>
> Richard
>
> Am 10.01.2007 um 08:08 schrieb Chris Lai /EEL/IT:
>
>> hi,
>>
>> I am having a problem on parsing xml with backspace (0x0008) char.
>>
>> (The tab char (0x0009) is fine)
>>
>> It turns out that dom4j cannot parse a xml with backspace char even
>> the xml
>> is generated by the dom4j itself.
>>
>> To demo the problem, here is the code section:
>>
>> import org.dom4j.Document;
>> import org.dom4j.DocumentException;
>> import org.dom4j.DocumentHelper;
>> import org.dom4j.Element;
>>
>> public class Foo {
>>
>>     public static void main(String args[] ) {
>>         Document document = DocumentHelper.createDocument();
>>         Element root = document.addElement( "root" );
>>
>>         Element author1 = root.addElement( "author" )
>>             .addText( "James Strachan" + (new Character((char)
>> 8)).toString() );
>>
>>         String text = document.asXML();
>>         System.out.println(text);
>>
>>         try
>>         {
>>             DocumentHelper.parseText(text);
>>         }
>>         catch (DocumentException e)
>>         {
>>             e.printStackTrace();             //<----- excepiton occurs
>>         }
>>
>>     }
>> }
>>
>> The following are the output:
>>
>> <?xml version="1.0" encoding="UTF-8"?>
>> <root><author>James Strachan&#8;</author></root>
>> org.dom4j.DocumentException: Error on line 2 of document  : Character
>> reference "&#8" is an invalid XML character. Nested exception:
>> Character
>> reference "&#8" is an invalid XML character.
>>      at org.dom4j.io.SAXReader.read(SAXReader.java:482)
>>      at org.dom4j.DocumentHelper.parseText(DocumentHelper.java:278)
>>      at Foo.main(Foo.java:24)
>> Nested exception:
>> org.xml.sax.SAXParseException: Character reference "&#8" is an
>> invalid XML
>> character.
>>      at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>>      at org.dom4j.io.SAXReader.read(SAXReader.java:465)
>>      at org.dom4j.DocumentHelper.parseText(DocumentHelper.java:278)
>>      at Foo.main(Foo.java:24)
>> Nested exception: org.xml.sax.SAXParseException: Character
>> reference "&#8"
>> is an invalid XML character.
>>      at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
>>      at org.dom4j.io.SAXReader.read(SAXReader.java:465)
>>      at org.dom4j.DocumentHelper.parseText(DocumentHelper.java:278)
>>      at Foo.main(Foo.java:24)
>>
>> Anyone know how to solve it? I use the dom4j-1.6.1.jar
>>
>> Regards,
>>
>> Chris Lai
>>
>> 29597369
>> GET 6303
>>
>>
>> --------------------------------------------------------------------- 
>> -
>> ---
>> Take Surveys. Earn Cash. Influence the Future of IT
>> Join SourceForge.net's Techsay panel and you'll get the chance to
>> share your
>> opinions on IT & business topics through brief surveys - and earn  
>> cash
>> http://www.techsay.com/default.php?
>> page=join.php&p=sourceforge&CID=DEVDEV
>> _______________________________________________
>> dom4j-user mailing list
>> dom4j-user@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/dom4j-user


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
dom4j-user mailing list
dom4j-user@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dom4j-user

Reply via email to