Dear Amir,
I recently developed a normalizer and tokenizer for Persian language. We used the first version of this tool in our IWSLT submission this year, and got around 1.5 Bleu point improvement over the baseline (which was tokenized using Moses built-in tokenizer). I am going to make it publicly available soon, but in case you are interested and want to use it in your experiments now, I can share the code with you. Bests, Amin On 01/24/2014 10:21 AM, Barry Haddow wrote: Hi Amir You can use this tokeniser as a basis for creating your own tokeniser, or you can swap in your own tokeniser. For EMS a tokeniser should read from stdin and write to stdout, so you can run it like thistokeniser [options] < input > output cheers - Barry On 24/01/14 08:58, amir haghighi wrote:I use the built-in tokenizer in the Moses. how can I change this tokenizer? should I change the source code? Regards Amir _______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support |
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support