Hi Ryan Conceptually, the easiest way is to regard segmentation as a preprocessing (and postprocessing) step that the core model has nothing to do with. You should not bother to modify the decoder itself in this case.
You will need a light wrapper for the Moses decoder. If you have a way to segment the training data, you can do the same right before translation for Thai-English and vice versa. For instance, a simple shell script. I suspect that removing spaces is even less of a problem. Regards Mathias > On 17 Dec 2017, at 13:08, Ryan Coughlin <rya...@sis.edu> wrote: > > Hi all, > > I'm trying to use Moses to handle Thai-English translation. As far as I > know, this never has been done. > > Thai is a language without spacing between words. Running a > word-segmentation script to put spaces in between words is rather trivial. > When training, I've pre-segmented the sentences with spaces between the words > and the training seems to go OK. > > My problem is with the decoder. Is there a way to modify it so that a Thai > sentence without spaces will be segmented to a sentence with spaces and then > decoded to a proper English sentence. And the reverse would be an English > sentence would be input and the Thai no space sentence would be output. Does > that make sense? Sorry for the noob question. > > Thank you for any and all help that you may give me. > > take care, > Ryan > _______________________________________________ > Moses-support mailing list > Moses-support@mit.edu > http://mailman.mit.edu/mailman/listinfo/moses-support
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support