Hi Mike, 

Maybe take a look at Rico's tool for handling unknown words in neural
machine translation. I have been playing around with that for
Russian-English and standard phrase-based SMT with some success. I am
just not sure if your small corpora will be enough to learn useful
segmentations though. 

It's an unsupervised method for word segmentation. For Russian-English I
created a code dictionary of the 100,000 most-frequent segments per
language. Unseen tokens will get segmented. The segmentation is not
neccessarily similar to a linguisticly correct segmentation, though. You
will probably want to try smaller numbers. 

Best, 

Marcin 

W dniu 2016-02-01 14:12, Michael Joyner napisaƂ(a): 

> I am trying to use Moses with Cherokee using the New Testament and Genesis as 
> primary corpus. I am feeding it the WEB, BBE as source English texts at the 
> moment.
> 
> As Cherokee uses bound pronouns and no articles and has almost nil 
> preposition analogues, (these features are mostly verb infixes), is there a 
> technique for corpus adjustment that can be done to improve the phrase 
> mapping between Cherokee and English?
> 
> I am currently doing Cherokee => English.
> 
> Thanks, Mike 
> -- 
> 
> WEB: World English Bible (Public Domain) 
> BBE: Basic English Bible (Public Domain) 
> 
> * Learn to the Cherokee language: http://jalagigawoni.gnomio.com/ [2]
> 
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support [1]

 

Links:
------
[1] http://mailman.mit.edu/mailman/listinfo/moses-support
[2] http://jalagigawoni.gnomio.com/
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to