encoding problem in textarea

Jeremy Quinn Fri, 03 Dec 2004 05:44:08 -0800

Hi All

I have an editor for XHTML snippets built in CForms using 2.1.7-dev.
It is very basic, it just uses a textarea.
I am having encoding issues, that appeared only in the last week or so, I cannot work out the solution.

Symptoms:
I have an accented character in my source document.
The document is displayed in the textarea, with that character corrupted.
If you save, that character is saved corrupted to disk.
I have the same accented character output by i18n outside of the textarea, it displays correctly.

Scenario:
My source document is UTF-8.
My serializer (o.a.c.serialization.HTMLSerializer) outputs UTF-8.
My web.xml's 'form-encoding' parameter is set to UTF-8.
My browser recognises the document as UTF-8.

Behaviour:
The character "é" (e acute) when outside of the textarea is serialised as é.
The same character is serialised as √© when it is within the textarea.
The brackets of the XHTML tags in the textarea are output as entities.

I output to log the string being edited. The accented character is correct before being added to the widget, correct after being added to the widget but before display.

If I edit the character to correct it in the form, the correct character is written to the file.
If I do not edit the character, the incorrect characters ("√©" ie. the characters represented by √©) are written to the file.

However, regardless of whether I edit it or not, my log message shows the correct character after the form has been submitted, before it has been written back.

Technique:
I read the XML Source to a String (to add to the textarea widget) like this:

var string = org.apache.avalon.excalibur.io.IOUtil.toString(
  new java.io.BufferedInputStream(
    org.apache.cocoon.components.source.SourceUtil.getInputSource(
      resolver.resolveURI(uri)
    ).getByteStream()
  )
);
form.lookupWidget("xhtml").setValue(string);

I write the String from the widget back to XML File like this:

var source = resolver.resolveURI(uri);
var dom = parser.parseDocument(
  new org.xml.sax.InputSource(
    new java.io.StringReader(form.lookupWidget("xhtml").getValue())
  )
);

// basically copied from the samples
var outputStream = null;
try {
var tf = Packages.javax.xml.transform.TransformerFactory.newInstance();
if (source instanceof Packages.org.apache.excalibur.source.ModifiableSource
&&
tf.getFeature(Packages.javax.xml.transform.sax.SAXTransformerFactory.FEA TURE))
{
outputStream = source.getOutputStream();
var transformerHandler = tf.newTransformerHandler();
var transformer = transformerHandler.getTransformer();
transformer.setOutputProperty(Packages.javax.xml.transform.OutputKeys.IN DENT, "true");
transformer.setOutputProperty(Packages.javax.xml.transform.OutputKeys.ME THOD, "xml");
transformer.setOutputProperty(Packages.javax.xml.transform.OutputKeys.EN CODING, "UTF-8");
transformerHandler.setResult(new Packages.javax.xml.transform.stream.StreamResult(outputStream));
var streamer = new Packages.org.apache.cocoon.xml.dom.DOMStreamer(transformerHandler);
streamer.stream(document);
} else {
throw ("error.source.not-writeable");
}
} catch (e) {
throw(e);
} finally {
if (outputStream != null) {
try {
outputStream.flush();
outputStream.close();
} catch (error) {
cocoon.log.error("Could not flush/close outputstream: " + error);
}
}
}


Can anyone see what I am doing wrong?

I have tried using org.apache.cocoon.components.serializers.HTMLSerializer, but CForms does not work with it.

I have tried different doctypes.

I tried pre entity encoding the accented character in the source document, and the textarea showed the raw entity.

I really have made this work before, but now I am completely stumped !!!

Thanks for any suggestions.

regards Jeremy


--------------------------------------------------------

                  If email from this address is not signed
                                IT IS NOT FROM ME

                        Always check the label, folks !!!!!
--------------------------------------------------------

smime.p7s
Description: S/MIME cryptographic signature

encoding problem in textarea

Reply via email to