Den 21. mai. 2010 kl. 12.03 skrev Thorsten Scherler:
>> The text returned by that Uri is:
>>
>> <?xml version="1.0" encoding="ISO-8859-1"?><div id="content"><h1>Divvun -
>> Sámi proofing tools project</h1><div id="content-main">
>>
>> <div class="note"><div class="label">UTF-8 character test</div><div
>> class="content">
>> There seems to be problems with certain characters, but only in
>> Dispatcher:<br xmlns:xi="http://www.w3.org/2001/XInclude"/>
>> a á c č d đ n ŋ s š t ŧ z ž ae æ
>> oe ø ao å a¨ ä o¨ ö g ǥ h ħ u ʉ i ɨ
>> </div></div>
>>
>> </div></div>
>>
>> Two things to note here:
>>
>> The encoding is specified as ISO-8859-1, which is wrong,
>
> yes should be utf8.
>>
...
>> I don't know where the encoding comes from - everything on my end is marked
>> as UTF-8. I grepped for the string "ISO-8859-1" in the Forrest sources, and
>> got many hits, but nothing that seemed to relate to Dispatcher.
>
> The *.body.xml comes from the dataModel.xmap:
>
> <!-- HTML rendered from intermediate format -->
> <map:match pattern="**.body.xml">
> <map:generate src="cocoon:/{1}.source.rewritten.xml" />
> <map:transform src="{lm:dataModel-html-document-to-html.xsl}">
> <map:parameter name="path" value="{1}.html" />
> </map:transform>
> <map:serialize />
> </map:match>
>
> The serializer here is the default one.
>
> we define it in the xmap as
>
> <map:serializers default="xml" />
>
> That should read:
> <map:serializers default="xml-utf8" />
>
> I added to revision 946939 please see whether that fixes the issue. I added a
> test note to
> org.apache.forrest.plugin.internal.dispatcher/src/documentation/content/xdocs/index.xml
> so you can directly run "forrest run" in the plugin and see the outcome.
I did it using my own site (the same document as earlier) - and your change
FIXED the bug:)
All instances of garbled utf-8 characters are now fixed, both in the body text,
and elsewhere.
Thanks a lot!
Best,
Sjur