Thanks for the tip. I downloaded the full xom source, modified
XIncludeDriver.java, and ran ant java to generate a new xom-samples.jar.
I copied it to my docbook toolchain, replacing the original
xom-samples.jar. XOM serialization generates UTF-8 encoding now. Makes
for much better DocBook processing from there.
P.S. The build wasn't quite that simple: it failed at first, complaining
that it couldn't find jaxen-1.1.3-src. I downloaded that from
http://jaxen.codehaus.org/releases.html. Thankfully that made the build
happy.
On 03/22/2011 02:02 PM, Mauritz Jeanson wrote:
| -Original Message-
| From: Denis Bradford
|
| I've been trying to preprocess xincludes in my DocBook 5
| build with xom,
| using the incantation in Bob Stayton's Complete Guide:
|
| $ java -cp xom-1.2.1.jar:xom-samples.jar
| nu.xom.samples.XIncludeDriver source.xml serialized.xml
|
| The xincludes resolve just fine, but the serialized doc's
| encoding comes
| out as ISO-8859-1, so xom complains about UTF-8 characters in the
| source. The output doc ends, incomplete, with a cascase of xom
| Serializer errors.
|
| According to the XOM api doc, it should be possible to specify the
| encoding as UTF-8, but I haven't found how to do it from the command
| line. Anybody know how (or if there's a better solution)?
| I'm assuming
| the failure is on account of the encoding problem, since the
| document
| seems to process normally otherwise.
I just tried to process a couple of UTF-8 documents with XIncludeDriver
(using XOM 1.2.6), and there were no errors. Unencodable characters were
escaped as numeric character references in the output.
The encoding is hardcoded in XIncludeDriver.java:
Serializer outputter = new Serializer(System.out, ISO-8859-1);
This may be a little unfortunate (and not too hard to fix), but
XIncludeDriver is just a sample application after all.
Mauritz
-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org