Hi Paul, your problem can be solved by cloaking before parsing, followed by uncloaking. Cloaking hides DTD and modifies all entities so they are untouched during processing. Uncloaking is reverse process. I had the same problem with one special operation with docbook file. Script for Perl can be found here: http://docbook.svn.sourceforge.net/viewvc/docbook/trunk/contrib/tools/cloak/ I have also my own VB script variant - for Windows so there is not necessary to install Perl there. Jan _____
From: [email protected] [mailto:[email protected]] Sent: Tuesday, March 31, 2009 11:47 AM To: [email protected] Cc: [email protected] Subject: Re: Unicode entity resolved on reading document Hi Paul, Paul Wellner Bou <[email protected]> wrote on 03/31/2009 02:57:17 AM: > [email protected] wrote: > > I think it's better to explain why this is a problem for you. > > As long as the text encoding is correct there shouldn't be any > > problem with replacing the character... So why is there a problem? > > The problem is not technical in this case. It is a question of slightly > correcting some data in the SVG and writing it to a new file which > should be as similar as possible with the original file. This is > required as the people looking into the file to check it will compare it > with the original, don't have much knowledge about XML/SVG and will > reject it as there are modified lines which don't have to do anything > with the correction. Then you will either need to educate them or write a tool that will operate on the raw text stream. You could potentially write a post-processing step that entified any characters that are outside of 7bit Unicode. It might give almost the same input... > So it is not possible to use an XML parser without replacing entities? No, even if it was Batik would fail on valid input: <rect fill="#FF0000" x="0" y="0" width="200" height="200"/> So it's likely not useful...
