Re: [docbook-apps] Serializing DB 5 with XOM: wrong encoding

2011-03-25 Thread Denis Bradford
Thanks for the tip. I downloaded the full xom source, modified 
XIncludeDriver.java, and ran ant java to generate a new xom-samples.jar. 
I copied it to my docbook toolchain, replacing the original 
xom-samples.jar. XOM serialization generates UTF-8 encoding now. Makes 
for much better DocBook processing from there.



P.S. The build wasn't quite that simple: it failed at first, complaining 
that it couldn't find jaxen-1.1.3-src. I downloaded that from 
http://jaxen.codehaus.org/releases.html. Thankfully that made the build 
happy.



On 03/22/2011 02:02 PM, Mauritz Jeanson wrote:

|  -Original Message-
|  From: Denis Bradford
|
|  I've been trying to preprocess xincludes in my DocBook 5
|  build with xom,
|  using the incantation in Bob Stayton's Complete Guide:
|
|  $ java -cp xom-1.2.1.jar:xom-samples.jar
|  nu.xom.samples.XIncludeDriver source.xml  serialized.xml
|
|  The xincludes resolve just fine, but the serialized doc's
|  encoding comes
|  out as ISO-8859-1, so xom complains about UTF-8 characters in the
|  source. The output doc ends, incomplete, with a cascase of xom
|  Serializer errors.
|
|  According to the XOM api doc, it should be possible to specify the
|  encoding as UTF-8, but I haven't found how to do it from the command
|  line. Anybody know how (or if there's a better solution)?
|  I'm assuming
|  the failure is on account of the encoding problem, since the
|  document
|  seems to process normally otherwise.


I just tried to process a couple of UTF-8 documents with XIncludeDriver
(using XOM 1.2.6), and there were no errors. Unencodable characters were
escaped as numeric character references in the output.

The encoding is hardcoded in XIncludeDriver.java:

  Serializer outputter = new Serializer(System.out, ISO-8859-1);

This may be a little unfortunate (and not too hard to fix), but
XIncludeDriver is just a sample application after all.

Mauritz






-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org



RE: [docbook-apps] Serializing DB 5 with XOM: wrong encoding

2011-03-22 Thread Mauritz Jeanson
|  -Original Message-
|  From: Denis Bradford 
|  
|  I've been trying to preprocess xincludes in my DocBook 5 
|  build with xom, 
|  using the incantation in Bob Stayton's Complete Guide:
|  
|  $ java -cp xom-1.2.1.jar:xom-samples.jar
|  nu.xom.samples.XIncludeDriver source.xml  serialized.xml
|  
|  The xincludes resolve just fine, but the serialized doc's 
|  encoding comes 
|  out as ISO-8859-1, so xom complains about UTF-8 characters in the 
|  source. The output doc ends, incomplete, with a cascase of xom 
|  Serializer errors.
|  
|  According to the XOM api doc, it should be possible to specify the 
|  encoding as UTF-8, but I haven't found how to do it from the command 
|  line. Anybody know how (or if there's a better solution)? 
|  I'm assuming 
|  the failure is on account of the encoding problem, since the 
|  document 
|  seems to process normally otherwise.


I just tried to process a couple of UTF-8 documents with XIncludeDriver
(using XOM 1.2.6), and there were no errors. Unencodable characters were
escaped as numeric character references in the output.

The encoding is hardcoded in XIncludeDriver.java: 

 Serializer outputter = new Serializer(System.out, ISO-8859-1);

This may be a little unfortunate (and not too hard to fix), but
XIncludeDriver is just a sample application after all. 

Mauritz



-
To unsubscribe, e-mail: docbook-apps-unsubscr...@lists.oasis-open.org
For additional commands, e-mail: docbook-apps-h...@lists.oasis-open.org