Re: [Moses-support] handling no-space languages in the decoder

Mathias Müller Mon, 18 Dec 2017 00:35:43 -0800

Hi Ryan

Conceptually, the easiest way is to regard segmentation as a preprocessing (and 
postprocessing) step that the core model has nothing to do with. You should not 
bother to modify the decoder itself in this case.


You will need a light wrapper for the Moses decoder. If you have a way to 
segment the training data, you can do the same right before translation for 
Thai-English and vice versa. For instance, a simple shell script.

I suspect that removing spaces is even less of a problem.

Regards
Mathias

> On 17 Dec 2017, at 13:08, Ryan Coughlin <rya...@sis.edu> wrote:
> 
> Hi all,
> 
>   I'm trying to use Moses to handle Thai-English translation. As far as I 
> know, this never has been done.
> 
>   Thai is a language without spacing between words. Running a 
> word-segmentation script to put spaces in between words is rather trivial. 
> When training, I've pre-segmented the sentences with spaces between the words 
> and the training seems to go OK.
> 
>   My problem is with the decoder. Is there a way to modify it so that a Thai 
> sentence without spaces will be segmented to a sentence with spaces and then 
> decoded to a proper English sentence. And the reverse would be an English 
> sentence would be input and the Thai no space sentence would be output. Does 
> that make sense? Sorry for the noob question.
> 
>   Thank you for any and all help that you may give me.
> 
> take care,
> Ryan
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] handling no-space languages in the decoder

Reply via email to