[Moses-support] Tokenization

Justin Cunningham Sun, 12 Apr 2020 10:29:46 -0700

Hi,

I’m currently working on a Neural Machine Translator but I am quite new to it 
all. I am trying to tokenise my files in Linux using the following shell script 
(https://github.com/JustCunn/IrishNMT/blob/master/GaeilgePrepare.sh) and these 
files:


http://opus.nlpl.eu/download.php?f=EUbookshop/v2/moses/en-ga.txt.zip<http://opus.nlpl.eu/download.php?f=EUbookshop/v2/moses/de-fr.txt.zip>
http://opus.nlpl.eu/download.php?f=QED/v2.0a/moses/en-ga.txt.zip

But it just won’t work. Sometimes it will skip it, others it will just be stuck 
on the ‘Tokenizer... number of threads...”. For context, they are all plain 
text files. Am I not formatting the text correctly?

I’d appreciate if someone could help me with this as it would be a huge help in 
my understanding of it all.

Thanks,
Justin

_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

[Moses-support] Tokenization

Reply via email to