Dear Amir,

I recently developed a normalizer and tokenizer for Persian language. We used the first version of this tool in our IWSLT submission this year, and got around 1.5 Bleu point improvement over the baseline (which was tokenized using Moses built-in tokenizer).
I am going to make it publicly available soon, but in case you are interested and want to use it in your experiments now, I can share the code with you.

Bests,
Amin

On 01/24/2014 10:21 AM, Barry Haddow wrote:
Hi Amir

You can use this tokeniser as a basis for creating your own tokeniser, 
or you can swap in your own tokeniser. For EMS a tokeniser should read 
from stdin and write to stdout, so you can run it like this

tokeniser [options] < input > output

cheers - Barry

On 24/01/14 08:58, amir haghighi wrote:
I use the built-in tokenizer in the Moses.
how can I change this tokenizer? should I change the  source code?

Regards
Amir


_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to