Re: [Moses-support] Tokenization

Hieu Hoang Sun, 12 Apr 2020 13:27:45 -0700

the moses tokenizer expects the input from standard in

Hieu Hoang
http://statmt.org/hieu



On Sun, 12 Apr 2020 at 10:27, Justin Cunningham <just1br...@outlook.com>
wrote:

> Hi,
>
> I’m currently working on a Neural Machine Translator but I am quite new to
> it all. I am trying to tokenise my files in Linux using the following shell
> script (https://github.com/JustCunn/IrishNMT/blob/master/GaeilgePrepare.sh)
> and these files:
>
> http://opus.nlpl.eu/download.php?f=EUbookshop/v2/moses/en-ga.txt.zip
> <http://opus.nlpl.eu/download.php?f=EUbookshop/v2/moses/de-fr.txt.zip>
> http://opus.nlpl.eu/download.php?f=QED/v2.0a/moses/en-ga.txt.zip
>
> But it just won’t work. Sometimes it will skip it, others it will just be
> stuck on the ‘Tokenizer... number of threads...”. For context, they are all
> plain text files. Am I not formatting the text correctly?
>
> I’d appreciate if someone could help me with this as it would be a huge
> help in my understanding of it all.
>
> Thanks,
> Justin
> _______________________________________________
> Moses-support mailing list
> Moses-support@mit.edu
> http://mailman.mit.edu/mailman/listinfo/moses-support
>

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Re: [Moses-support] Tokenization

Reply via email to