On 13 February 2013 21:00, Per Tunedal <per.tune...@operamail.com> wrote: > Well, > I ran your script afterwords, and the Swedish characters where corrected > - but the Danish ones where damaged: > > Before: > <e><p><l>BlomkÃ¥l<s n="n"/></l><r>blomkål<s n="n"/></r></p></e> > <e><p><l>BlÃ¥mussla<s n="n"/></l><r>blåmusling<s n="n"/></r></p></e> > <e><p><l>Samlag<s n="n"/></l><r>bolde<s n="n"/></r></p></e> > <e><p><l>Bomb<s n="n"/></l><r>bombe<s n="n"/></r></p></e> > <e><p><l>Brandy_<s n="n"/></l><r>brandy<s n="n"/></r></p></e> > <e><p><l>Hallonsläktet<s n="n"/></l><r>brombær<s n="n"/></r></p></e> > <e><p><l>BröllopstÃ¥rta<s n="n"/></l><r>bryllupskage<s > n="n"/></r></p></e> > <e><p><l>Kvinnobröst<s n="n"/></l><r>bryst<s n="n"/></r></p></e> > <e><p><l>Bröd<s n="n"/></l><r>brød<s n="n"/></r></p></e> > <e><p><l>Bulgur<s n="n"/></l><r>bulgur<s n="n"/></r></p></e> > <e><p><l>Bunsenbrännare<s n="n"/></l><r>bunsenbrænder<s > n="n"/></r></p></e> > <e><p><l>Böna<s n="n"/></l><r>bønne<s n="n"/></r></p></e> > <e><p><l>Böna<s n="n"/></l><r>bønner<s n="n"/></r></p></e> > > after: > <e><p><l>Blomkål<s n="n"/></l><r>blomk?l<s n="n"/></r></p></e> > <e><p><l>Blåmussla<s n="n"/></l><r>bl?musling<s n="n"/></r></p></e> > <e><p><l>Samlag<s n="n"/></l><r>bolde<s n="n"/></r></p></e> > <e><p><l>Bomb<s n="n"/></l><r>bombe<s n="n"/></r></p></e> > <e><p><l>Brandy_<s n="n"/></l><r>brandy<s n="n"/></r></p></e> > <e><p><l>Hallonsläktet<s n="n"/></l><r>bromb?r<s n="n"/></r></p></e> > <e><p><l>Bröllopstårta<s n="n"/></l><r>bryllupskage<s > n="n"/></r></p></e> > <e><p><l>Kvinnobröst<s n="n"/></l><r>bryst<s n="n"/></r></p></e> > <e><p><l>Bröd<s n="n"/></l><r>br?d<s n="n"/></r></p></e> > <e><p><l>Bulgur<s n="n"/></l><r>bulgur<s n="n"/></r></p></e> > <e><p><l>Bunsenbrännare<s n="n"/></l><r>bunsenbr?nder<s > n="n"/></r></p></e> > <e><p><l>Böna<s n="n"/></l><r>b?nne<s n="n"/></r></p></e> > <e><p><l>Böna<s n="n"/></l><r>b?nner<s n="n"/></r></p></e> > > That's strange, because your script corrected the file translated in the > other direction OK.
Yes, because it was expecting the corrupted characters to be on the right, so to go the other way it would need to be: perl -MEncode -ane 'chomp;if(m!(<e><p><l>)([^<]*)(<s n="n"/></l><r>)([^<]*)(<s n="n"/></r></p></e>)!){$rec=encode("iso-8859-1",decode("utf-8", $2));if($2 eq lc($2)){$rec=lc($rec);}; print "$1$rec$3$4$5\n";}' -- <Sefam> Are any of the mentors around? <jimregan> yes, they're the ones trolling you ------------------------------------------------------------------------------ Free Next-Gen Firewall Hardware Offer Buy your Sophos next-gen firewall before the end March 2013 and get the hardware for free! Learn more. http://p.sf.net/sfu/sophos-d2d-feb _______________________________________________ Apertium-stuff mailing list Apertium-stuff@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/apertium-stuff