see here: http://jeremy.zawodny.com/blog/archives/010546.html
for a discussion of utf8 v UTF8 ... now off to see England triumphant against Germany Miles On 27 June 2010 13:23, Miles Osborne <mi...@inf.ed.ac.uk> wrote: > on the subject of UTF8, i think the Moses tokeniser may be using the > version that is too strict. > > i've just changed it to this: >> > binmode(STDIN, ":encoding(UTF-8)"); > binmode(STDOUT, ":encoding(UTF-8)"); >> > > > and later on in the same file,: >> > open(PREFIX, "<::encoding(UTF-8)", "$prefixfile"); >> > > see if this helps. > > Miles > > On 27 June 2010 13:15, Ingrid Falk <ingrid.f...@loria.fr> wrote: >> Hi Cyrine, >> >> I think this is because tokenizer.perl expects utf-8 input (on STDIN). >> >> This is because of the binmode(STDIN, ':utf8'); line in the tokenizer >> script. >> >> Your input is maybe not utf-8? >> >> Ingrid >> >> On 06/27/2010 01:08 PM, Cyrine NASRI wrote: >>> >>> Hello everyone, >>> I try to run the script for my two tokenizer.perl development file. >>> I'm having a problem when running, but I do not understand why. >>> A message appears: >>> >>> /home/Bureau/moses/moses/scripts/tokenizer$ ./tokenizer.perl -l fr < >>> /home/Bureau/work/test-fr.fr <http://test-fr.fr> > >>> /home/Bureau/work/input.tok >>> Tokenizer Version 1.0 >>> Language: fr >>> WARNING: No known abbreviations for language 'fr', attempting fall-back >>> to English version... >>> utf8 "\xE9" does not map to Unicode at ./tokenizer.perl line 47, <STDIN> >>> line 1. >>> Malformed UTF-8 character (fatal) at ./tokenizer.perl line 67, <STDIN> >>> line 1. >>> >>> Thank you very much. >>> >>> Sincerely >>> Cyrine >>> >>> >>> >>> _______________________________________________ >>> Moses-support mailing list >>> Moses-support@mit.edu >>> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> _______________________________________________ >> Moses-support mailing list >> Moses-support@mit.edu >> http://mailman.mit.edu/mailman/listinfo/moses-support >> >> > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support