Hi Paul, 
 
your problem can be solved by cloaking before parsing, followed by
uncloaking. Cloaking hides DTD and modifies all entities so they are
untouched during processing. Uncloaking is reverse process. I had the same
problem with one special operation with docbook file. Script for Perl can be
found here:
http://docbook.svn.sourceforge.net/viewvc/docbook/trunk/contrib/tools/cloak/
I have also my own VB script variant - for Windows so there is not necessary
to install Perl there.
 
Jan
 
  _____  

From: [email protected] [mailto:[email protected]] 
Sent: Tuesday, March 31, 2009 11:47 AM
To: [email protected]
Cc: [email protected]
Subject: Re: Unicode entity resolved on reading document




Hi Paul,

Paul Wellner Bou <[email protected]> wrote on 03/31/2009 02:57:17 AM:

> [email protected] wrote:
> >    I think it's better to explain why this is a problem for you.
> > As long as the text encoding is correct there shouldn't be any
> > problem with replacing the character... So why is there a problem?
> 
> The problem is not technical in this case. It is a question of slightly 
> correcting some data in the SVG and writing it to a new file which 
> should be as similar as possible with the original file. This is 
> required as the people looking into the file to check it will compare it 
> with the original, don't have much knowledge about XML/SVG and will 
> reject it as there are modified lines which don't have to do anything 
> with the correction.

   Then you will either need to educate them or write a tool that will 
operate on the raw text stream.  You could potentially write a 
post-processing step that entified any characters that are outside of 
7bit Unicode.  It might give almost the same input... 


> So it is not possible to use an XML parser without replacing entities?

   No, even if it was Batik would fail on valid input: 
        <rect fill="&#x23;&#x46;&#x46;&#x30;&#x30;&#x30;&#x30;" 
              x="0" y="0" width="200" height="200"/> 

   So it's likely not useful... 


Reply via email to