Probably the most important thing you can do is see whether your sentences are properly tokenized - ie. words are separated from punctuations and separated by spaces. The moses tokenizer work using the config files in scripts/share/nonbreaking_prefixes for each language. If you languages aren't yet included in here, create a new file for it. Please consider sharing your new file with us once you've created it.
Another thing you should look at is normalising the spelling. In some languages, some words can have optional characters that make it difficult for SMT, eg. clitic marks in arabic. On 27 April 2014 15:11, Vishal Goyal(विशाल गोयल) <[email protected]>wrote: > Respected all, > Greetings. > We are working on SMT using MOSES. Please guide me which are the script > that we may change and should change while developing any system using > MOSES. > And how can we get this information in length? > > Thanks in anticipation. > > -- > *Regards,* > Vishal Goyal, > Ph.D., M.Tech., MCA, M.C.S.D. > Assistant Professor(Stage III), > Department of Computer Science, > Punjabi University Patiala-147002 > *[ICON 2014- http://ltrc.iiit.ac.in/icon/2014/ > <http://ltrc.iiit.ac.in/icon/2014/>]* > [*Online Hindi to Punjabi Machine Translation Tool -* > http://h2p.learnpunjabi.org ] > *[Research Cell: An International Journal of Engineering Sciences, > http://ijoes.vidyapublications.com <http://ijoes.vidyapublications.com>]* > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > > -- Hieu Hoang Research Associate University of Edinburgh http://www.hoang.co.uk/hieu
_______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
