see here:

http://jeremy.zawodny.com/blog/archives/010546.html

for a discussion of utf8 v UTF8

... now off to see England triumphant against Germany

Miles

On 27 June 2010 13:23, Miles Osborne <mi...@inf.ed.ac.uk> wrote:
> on the subject of UTF8, i think the Moses tokeniser may be using the
> version that is too strict.
>
> i've just changed it to this:
>>
> binmode(STDIN, ":encoding(UTF-8)");
> binmode(STDOUT, ":encoding(UTF-8)");
>>
>
>
> and later on in the same file,:
>>
> open(PREFIX, "<::encoding(UTF-8)", "$prefixfile");
>>
>
> see if this helps.
>
> Miles
>
> On 27 June 2010 13:15, Ingrid Falk <ingrid.f...@loria.fr> wrote:
>> Hi Cyrine,
>>
>> I think this is because tokenizer.perl expects utf-8 input (on STDIN).
>>
>> This is because of the binmode(STDIN, ':utf8'); line in the tokenizer
>> script.
>>
>> Your input is maybe not utf-8?
>>
>> Ingrid
>>
>> On 06/27/2010 01:08 PM, Cyrine NASRI wrote:
>>>
>>> Hello everyone,
>>> I try to run the script for my two tokenizer.perl development file.
>>> I'm having a problem when running, but I do not understand why.
>>> A message appears:
>>>
>>>  /home/Bureau/moses/moses/scripts/tokenizer$ ./tokenizer.perl -l fr <
>>> /home/Bureau/work/test-fr.fr <http://test-fr.fr> >
>>> /home/Bureau/work/input.tok
>>> Tokenizer Version 1.0
>>> Language: fr
>>> WARNING: No known abbreviations for language 'fr', attempting fall-back
>>> to English version...
>>> utf8 "\xE9" does not map to Unicode at ./tokenizer.perl line 47, <STDIN>
>>> line 1.
>>> Malformed UTF-8 character (fatal) at ./tokenizer.perl line 67, <STDIN>
>>> line 1.
>>>
>>> Thank you very much.
>>>
>>> Sincerely
>>> Cyrine
>>>
>>>
>>>
>>> _______________________________________________
>>> Moses-support mailing list
>>> Moses-support@mit.edu
>>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>> _______________________________________________
>> Moses-support mailing list
>> Moses-support@mit.edu
>> http://mailman.mit.edu/mailman/listinfo/moses-support
>>
>>
>
>
>
> --
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
>



-- 
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to