Den 20. mai. 2010 kl. 15.26 skrev Thorsten Scherler:

> On 20/05/2010, at 14:18, Sjur Moshagen wrote:
>>> ...
>>> Hmm, that is weird. Please try the following:
>>> - add a new contract that uses ñ, í and similar characters
>>> - see what comes out
>> 
>> I added a blank contract that just printed the same line of characters I 
>> used earlier for testing, and this is what came out:
>> 
>> This is a text containing problematic characters:
>> a á c č d đ n ŋ s š t ŧ z ž ae æ oe ø ao å a¨ ä o¨ ö g ǥ h ħ u ʉ i ɨ
>> 
>> That is, the text from the contract comes through just fine, but text coming 
>> from a standard Forrest v2 document gets garbled.
>> 
>> I have attached a picture of the page as it renders. The box comes from the 
>> document, the text at the bottom is from the contract.
> 
> Ok I see. 
> 
> Please post the dataUri you use for the contract. It seems that the utf-8 is 
> lost in this step. If you have the dataUrl of the contract see what is coming 
> out there, whether it is already scrambled or not.

I'm not sure about how to do this, but I'll try. The dataUri used in the 
structurer is:

          <forrest:contract name="content-main" 
            dataURI="cocoon://#{$getRequest}.body.xml">   <-- this is the 
dataURI
            <forrest:property name="content-main-conf">
              <headings type="boxed"/>
            </forrest:property>
          </forrest:contract>

which I take to mean:

http://localhost:8888/index.body.xml

The text returned by that Uri is:

<?xml version="1.0" encoding="ISO-8859-1"?><div id="content"><h1>Divvun - Sámi 
proofing tools project</h1><div id="content-main">

          <div class="note"><div class="label">UTF-8 character test</div><div 
class="content">
                There seems to be problems with certain characters, but only in
                Dispatcher:<br xmlns:xi="http://www.w3.org/2001/XInclude"/>
                a á c &#269; d &#273; n &#331; s &#353; t &#359; z &#382; ae æ 
oe ø ao å a¨ ä o¨ ö g &#485; h &#295; u &#649; i &#616;
          </div></div>

  </div></div>

Two things to note here:

The encoding is specified as ISO-8859-1, which is wrong, and which leads to all 
characters outside Latin1 to be encoded as numeric entities. In the next step, 
this causes all non-ASCII, non-Latin1 characters to survive correctly, while 
the Latin1 chars will be messed up when they are reinterpreted as UTF-8 later - 
or something along these line.

I don't know where the encoding comes from - everything on my end is marked as 
UTF-8. I grepped for the string "ISO-8859-1" in the Forrest sources, and got 
many hits, but nothing that seemed to relate to Dispatcher.

Sjur