Hi All
I have an editor for XHTML snippets built in CForms using 2.1.7-dev.
It is very basic, it just uses a textarea.
I am having encoding issues, that appeared only in the last week or so, I cannot work out the solution.
Symptoms:
I have an accented character in my source document.
The document is displayed in the textarea, with that character corrupted.
If you save, that character is saved corrupted to disk.
I have the same accented character output by i18n outside of the textarea, it displays correctly.
Scenario: My source document is UTF-8. My serializer (o.a.c.serialization.HTMLSerializer) outputs UTF-8. My web.xml's 'form-encoding' parameter is set to UTF-8. My browser recognises the document as UTF-8.
Behaviour:
The character "é" (e acute) when outside of the textarea is serialised as é.
The same character is serialised as é when it is within the textarea.
The brackets of the XHTML tags in the textarea are output as entities.
I output to log the string being edited. The accented character is correct before being added to the widget, correct after being added to the widget but before display.
If I edit the character to correct it in the form, the correct character is written to the file.
If I do not edit the character, the incorrect characters ("é" ie. the characters represented by é) are written to the file.
However, regardless of whether I edit it or not, my log message shows the correct character after the form has been submitted, before it has been written back.
Technique:
I read the XML Source to a String (to add to the textarea widget) like this:
var string = org.apache.avalon.excalibur.io.IOUtil.toString( new java.io.BufferedInputStream( org.apache.cocoon.components.source.SourceUtil.getInputSource( resolver.resolveURI(uri) ).getByteStream() ) ); form.lookupWidget("xhtml").setValue(string);
I write the String from the widget back to XML File like this:
var source = resolver.resolveURI(uri); var dom = parser.parseDocument( new org.xml.sax.InputSource( new java.io.StringReader(form.lookupWidget("xhtml").getValue()) ) );
// basically copied from the samples
var outputStream = null;
try {
var tf = Packages.javax.xml.transform.TransformerFactory.newInstance();
if (source instanceof Packages.org.apache.excalibur.source.ModifiableSource
&&
tf.getFeature(Packages.javax.xml.transform.sax.SAXTransformerFactory.FEA TURE))
{
outputStream = source.getOutputStream();
var transformerHandler = tf.newTransformerHandler();
var transformer = transformerHandler.getTransformer();
transformer.setOutputProperty(Packages.javax.xml.transform.OutputKeys.IN DENT, "true");
transformer.setOutputProperty(Packages.javax.xml.transform.OutputKeys.ME THOD, "xml");
transformer.setOutputProperty(Packages.javax.xml.transform.OutputKeys.EN CODING, "UTF-8");
transformerHandler.setResult(new Packages.javax.xml.transform.stream.StreamResult(outputStream));
var streamer = new Packages.org.apache.cocoon.xml.dom.DOMStreamer(transformerHandler);
streamer.stream(document);
} else {
throw ("error.source.not-writeable");
}
} catch (e) {
throw(e);
} finally {
if (outputStream != null) {
try {
outputStream.flush();
outputStream.close();
} catch (error) {
cocoon.log.error("Could not flush/close outputstream: " + error);
}
}
}
Can anyone see what I am doing wrong?
I have tried using org.apache.cocoon.components.serializers.HTMLSerializer, but CForms does not work with it.
I have tried different doctypes.
I tried pre entity encoding the accented character in the source document, and the textarea showed the raw entity.
I really have made this work before, but now I am completely stumped !!!
Thanks for any suggestions.
regards Jeremy
--------------------------------------------------------
If email from this address is not signed IT IS NOT FROM ME
Always check the label, folks !!!!! --------------------------------------------------------
smime.p7s
Description: S/MIME cryptographic signature