Den 20. mai. 2010 kl. 15.26 skrev Thorsten Scherler: > On 20/05/2010, at 14:18, Sjur Moshagen wrote: >>> ... >>> Hmm, that is weird. Please try the following: >>> - add a new contract that uses ñ, í and similar characters >>> - see what comes out >> >> I added a blank contract that just printed the same line of characters I >> used earlier for testing, and this is what came out: >> >> This is a text containing problematic characters: >> a á c č d đ n ŋ s š t ŧ z ž ae æ oe ø ao å a¨ ä o¨ ö g ǥ h ħ u ʉ i ɨ >> >> That is, the text from the contract comes through just fine, but text coming >> from a standard Forrest v2 document gets garbled. >> >> I have attached a picture of the page as it renders. The box comes from the >> document, the text at the bottom is from the contract. > > Ok I see. > > Please post the dataUri you use for the contract. It seems that the utf-8 is > lost in this step. If you have the dataUrl of the contract see what is coming > out there, whether it is already scrambled or not.
I'm not sure about how to do this, but I'll try. The dataUri used in the structurer is: <forrest:contract name="content-main" dataURI="cocoon://#{$getRequest}.body.xml"> <-- this is the dataURI <forrest:property name="content-main-conf"> <headings type="boxed"/> </forrest:property> </forrest:contract> which I take to mean: http://localhost:8888/index.body.xml The text returned by that Uri is: <?xml version="1.0" encoding="ISO-8859-1"?><div id="content"><h1>Divvun - Sámi proofing tools project</h1><div id="content-main"> <div class="note"><div class="label">UTF-8 character test</div><div class="content"> There seems to be problems with certain characters, but only in Dispatcher:<br xmlns:xi="http://www.w3.org/2001/XInclude"/> a á c č d đ n ŋ s š t ŧ z ž ae æ oe ø ao å a¨ ä o¨ ö g ǥ h ħ u ʉ i ɨ </div></div> </div></div> Two things to note here: The encoding is specified as ISO-8859-1, which is wrong, and which leads to all characters outside Latin1 to be encoded as numeric entities. In the next step, this causes all non-ASCII, non-Latin1 characters to survive correctly, while the Latin1 chars will be messed up when they are reinterpreted as UTF-8 later - or something along these line. I don't know where the encoding comes from - everything on my end is marked as UTF-8. I grepped for the string "ISO-8859-1" in the Forrest sources, and got many hits, but nothing that seemed to relate to Dispatcher. Sjur