OK, I was able to get one of the question marks to go away (leaving a single question mark where the space should be). Here is what I changed:
serializer.setOutputStream(new PrintStream(new FileOutputStream(results), false, "UTF-8")); and to read the file.... InputStreamReader fileReader = new InputStreamReader(new FileInputStream(html), "UTF-8"); BufferedReader reader = new BufferedReader(fileReader); log.debug("Encoding for " + html + ": " + fileReader.getEncoding()); ....this prints "UTF8" as the encoding (without the dash). What's up with that? Anyway, I think we are getting closer. On 11/30/05, Craig McDaniel <[EMAIL PROTECTED]> wrote: > I've been able to debug this a little bit, and it seems that, even > though I am setting the output encoding to UTF-8, it is being written > as ASCII. Since we can't get much farther without posting code, here > goes: > > Serializer serializer = SerializerFactory.getSerializer(props); > log.debug("Output Encoding: " + > serializer.getOutputFormat().getProperty("encoding")); > serializer.setOutputStream(new FileOutputStream(results)); > filters[lastFilter].setContentHandler(serializer.asContentHandler()); > filters[lastFilter].parse(new InputSource(new FileReader(xmlFile))); > log.debug("Finished the transformation"); > > The first log message indeed prints "Output Encoding: UTF-8". However, > when I create a FileReader for this same File ("results" in the code > above), and do file.getEncoding(), it prints "ASCII". Also, when I > look at the file with less, I see "General<C2><A0>Electric" and in > emacs, I see "General??Electric". This is just an XSL transform up to > this point, nothing FOP-specific (though the file is a FO document), > so perhaps the Xalan list is the proper place for this question? > > Here is the code for the Reader: > > FileReader fileReader = new FileReader(foFile); > BufferedReader reader = new BufferedReader(fileReader); > log.debug("Encoding for " + foFile + ": " + fileReader.getEncoding()); > > Again, this prints "Encoding for /tmp/quarterly40215.xml: ASCII". At > this point, the reader is used to read the file into a byte array. > Then it is wrapped in a ByteArrayInput stream and fed to the FOP > Driver. Are we any closer? > > > On 11/25/05, Craig McDaniel <[EMAIL PROTECTED]> wrote: > > On 11/25/05, Andreas L Delmelle <[EMAIL PROTECTED]> wrote: > > > On Nov 25, 2005, at 22:14, Craig McDaniel wrote: > > > > > > > I am trying to debug a PDF rendering for a client where non-breaking > > > > spaces are comming out as double question marks "??". FOP is being > > > > called from a servlet. I have tried using the fop command line tool > > > > and can not reproduce the problem. I have written an simple servlet on > > > > another system that functionally does the same thing, and can not > > > > reproduce the problem here either. > > > > > > > > Any ideas what could cause this? Is it some kind of character encoding > > > > issue? > > > > > > Indeed. The question-marks are most likely related to: > > > http://java.sun.com/j2se/1.4.2/docs/api/java/nio/charset/ > > > CharsetEncoder.html > > > > > > > The entity   is being used. What should my next step be > > > > in debugging this? > > > > > > Firstly: are you still using FOP 0.20.5? If so, can you try out the > > > recent alpha release, and report if the problem still occurs? > > > > I am using 0.20.5. Unfortunately, I do not have access to deploy > > changes to the server at this time, so I am unable to test changes in > > the only environment where the problem is happening ;-( > > > > > If you can't (or are already using FOP 0.90alpha), I think the best > > > bet is to go looking for places --in the servlet code, I presume-- > > > where an XML declaration is hard-coded as a String literal or where a > > > Charset is used that's different from the default (= UTF-8). > > > > The original data file has no XML declaration. The stylesheet has one, > > but does not have an encoding attribute. The   entities are in > > the XSL, by the way. > > > > I almost feel like I am debugging this thing blind. I do have the > > source code, but it is too spread out to post here. It might be worth > > pointing out that the XSL is applied to the XML data and sent to a > > ByteArrayOutputStream. The byte array is then stored and later passed > > into the FOP driver as a ByteArrayInputStream. Likewise, the output of > > the driver is written to a byte array and finally, it gets sent to the > > browser with response.getOutputStream().write(bytes). Not the way I > > would have done it. Anyway, like I said, I coded up a servlet just > > like this one and could not reproduce the problem in my own > > environment. Perhaps this is a default encoding problem. > > > > > HTH! > > > > Absolutely, thanks for your help! > > > > > Cheers, > > > > > > Andreas > > > > > > -- > > Craig McDaniel > > > > > -- > Craig McDaniel > -- Craig McDaniel --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]