Hi, I’m currently working on a Neural Machine Translator but I am quite new to it all. I am trying to tokenise my files in Linux using the following shell script (https://github.com/JustCunn/IrishNMT/blob/master/GaeilgePrepare.sh) and these files:
http://opus.nlpl.eu/download.php?f=EUbookshop/v2/moses/en-ga.txt.zip<http://opus.nlpl.eu/download.php?f=EUbookshop/v2/moses/de-fr.txt.zip> http://opus.nlpl.eu/download.php?f=QED/v2.0a/moses/en-ga.txt.zip But it just won’t work. Sometimes it will skip it, others it will just be stuck on the ‘Tokenizer... number of threads...”. For context, they are all plain text files. Am I not formatting the text correctly? I’d appreciate if someone could help me with this as it would be a huge help in my understanding of it all. Thanks, Justin
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support