On 13 February 2013 21:00, Per Tunedal <per.tune...@operamail.com> wrote:
> Well,
> I ran your script afterwords, and the Swedish characters where corrected
> - but the Danish ones where damaged:
>
> Before:
> <e><p><l>BlomkÃ¥l<s n="n"/></l><r>blomkål<s n="n"/></r></p></e>
> <e><p><l>BlÃ¥mussla<s n="n"/></l><r>blåmusling<s n="n"/></r></p></e>
> <e><p><l>Samlag<s n="n"/></l><r>bolde<s n="n"/></r></p></e>
> <e><p><l>Bomb<s n="n"/></l><r>bombe<s n="n"/></r></p></e>
> <e><p><l>Brandy_<s n="n"/></l><r>brandy<s n="n"/></r></p></e>
> <e><p><l>Hallonsläktet<s n="n"/></l><r>brombær<s n="n"/></r></p></e>
> <e><p><l>Bröllopstårta<s n="n"/></l><r>bryllupskage<s
> n="n"/></r></p></e>
> <e><p><l>Kvinnobröst<s n="n"/></l><r>bryst<s n="n"/></r></p></e>
> <e><p><l>Bröd<s n="n"/></l><r>brød<s n="n"/></r></p></e>
> <e><p><l>Bulgur<s n="n"/></l><r>bulgur<s n="n"/></r></p></e>
> <e><p><l>Bunsenbrännare<s n="n"/></l><r>bunsenbrænder<s
> n="n"/></r></p></e>
> <e><p><l>Böna<s n="n"/></l><r>bønne<s n="n"/></r></p></e>
> <e><p><l>Böna<s n="n"/></l><r>bønner<s n="n"/></r></p></e>
>
> after:
> <e><p><l>Blomkål<s n="n"/></l><r>blomk?l<s n="n"/></r></p></e>
> <e><p><l>Blåmussla<s n="n"/></l><r>bl?musling<s n="n"/></r></p></e>
> <e><p><l>Samlag<s n="n"/></l><r>bolde<s n="n"/></r></p></e>
> <e><p><l>Bomb<s n="n"/></l><r>bombe<s n="n"/></r></p></e>
> <e><p><l>Brandy_<s n="n"/></l><r>brandy<s n="n"/></r></p></e>
> <e><p><l>Hallonsläktet<s n="n"/></l><r>bromb?r<s n="n"/></r></p></e>
> <e><p><l>Bröllopstårta<s n="n"/></l><r>bryllupskage<s
> n="n"/></r></p></e>
> <e><p><l>Kvinnobröst<s n="n"/></l><r>bryst<s n="n"/></r></p></e>
> <e><p><l>Bröd<s n="n"/></l><r>br?d<s n="n"/></r></p></e>
> <e><p><l>Bulgur<s n="n"/></l><r>bulgur<s n="n"/></r></p></e>
> <e><p><l>Bunsenbrännare<s n="n"/></l><r>bunsenbr?nder<s
> n="n"/></r></p></e>
> <e><p><l>Böna<s n="n"/></l><r>b?nne<s n="n"/></r></p></e>
> <e><p><l>Böna<s n="n"/></l><r>b?nner<s n="n"/></r></p></e>
>
> That's strange, because your script corrected the file translated in the
> other direction OK.

Yes, because it was expecting the corrupted characters to be on the
right, so to go the other way it would need to be:
perl -MEncode -ane 'chomp;if(m!(<e><p><l>)([^<]*)(<s
n="n"/></l><r>)([^<]*)(<s
n="n"/></r></p></e>)!){$rec=encode("iso-8859-1",decode("utf-8",
$2));if($2 eq lc($2)){$rec=lc($rec);}; print "$1$rec$3$4$5\n";}'

-- 
<Sefam> Are any of the mentors around?
<jimregan> yes, they're the ones trolling you

------------------------------------------------------------------------------
Free Next-Gen Firewall Hardware Offer
Buy your Sophos next-gen firewall before the end March 2013 
and get the hardware for free! Learn more.
http://p.sf.net/sfu/sophos-d2d-feb
_______________________________________________
Apertium-stuff mailing list
Apertium-stuff@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/apertium-stuff

Reply via email to