Hi Mike,
Maybe take a look at Rico's tool for handling unknown words in neural machine translation. I have been playing around with that for Russian-English and standard phrase-based SMT with some success. I am just not sure if your small corpora will be enough to learn useful segmentations though. It's an unsupervised method for word segmentation. For Russian-English I created a code dictionary of the 100,000 most-frequent segments per language. Unseen tokens will get segmented. The segmentation is not neccessarily similar to a linguisticly correct segmentation, though. You will probably want to try smaller numbers. Best, Marcin W dniu 2016-02-01 14:12, Michael Joyner napisaĆ(a): > I am trying to use Moses with Cherokee using the New Testament and Genesis as > primary corpus. I am feeding it the WEB, BBE as source English texts at the > moment. > > As Cherokee uses bound pronouns and no articles and has almost nil > preposition analogues, (these features are mostly verb infixes), is there a > technique for corpus adjustment that can be done to improve the phrase > mapping between Cherokee and English? > > I am currently doing Cherokee => English. > > Thanks, Mike > -- > > WEB: World English Bible (Public Domain) > BBE: Basic English Bible (Public Domain) > > * Learn to the Cherokee language: http://jalagigawoni.gnomio.com/ [2] > > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support [1] Links: ------ [1] http://mailman.mit.edu/mailman/listinfo/moses-support [2] http://jalagigawoni.gnomio.com/
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support