Probably the most important thing you can do is see whether your sentences
are properly tokenized - ie. words are separated from punctuations and
separated by spaces. The moses tokenizer work using the config files in
   scripts/share/nonbreaking_prefixes
for each language. If you languages aren't yet included in here, create a
new file for it. Please consider sharing your new file with us once you've
created it.

Another thing you should look at is normalising the spelling. In some
languages, some words can have optional characters that make it difficult
for SMT, eg. clitic marks in arabic.




On 27 April 2014 15:11, Vishal Goyal(विशाल गोयल) <[email protected]>wrote:

> Respected all,
> Greetings.
> We are working on SMT using MOSES. Please guide me which are the script
> that we may change and should change while developing any system using
> MOSES.
> And how can we get this information in length?
>
> Thanks in anticipation.
>
> --
> *Regards,*
> Vishal Goyal,
> Ph.D., M.Tech., MCA, M.C.S.D.
> Assistant Professor(Stage III),
> Department of Computer Science,
> Punjabi University Patiala-147002
> *[ICON 2014- http://ltrc.iiit.ac.in/icon/2014/
> <http://ltrc.iiit.ac.in/icon/2014/>]*
> [*Online Hindi to Punjabi Machine Translation Tool -*
> http://h2p.learnpunjabi.org ]
> *[Research Cell: An International Journal of Engineering Sciences,
> http://ijoes.vidyapublications.com <http://ijoes.vidyapublications.com>]*
>
>
> _______________________________________________
> Moses-support mailing list
> [email protected]
> http://mailman.mit.edu/mailman/listinfo/moses-support
>
>


-- 
Hieu Hoang
Research Associate
University of Edinburgh
http://www.hoang.co.uk/hieu
_______________________________________________
Moses-support mailing list
[email protected]
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to