Hi Folks, I managed to generate my first language pack today based on heiro model. It's 4.8GB in size so I have made it available via my home.apache.org public space at [0]. Right now it is uploading and will take a wee while. I would like some community review so we can review the quality of what has been generated. In addition there are a number of immediate things I am struggling with.
Firstly, the following files were not present after running the bundler.py. - prepare.sh, this is a baseline requirement for running the tests as detailed within the auto-generated README. - the entire 'scripts' directory!!! This means that no utility processing can be undertaken at all. I know that both of the above are essential requirements, I therefore added them from a different language pack, increased default maximum memory usage and also augmented the README with some details regarding the dataset used to generate the language pack. In comparison to the es --> en language pack posted by Matt, due to the fat that no scripts directory was generated, this language pack does not have the scripts/release directory either. I am not sure how this was generated. Over and above what I've detailed so far, there is one blocking issue for me... when I submit Russian text to the Joshua server, it just spits back out the same Russian text! I can see the decoder logging to std out however I can only assume that no decoding is actually taking place. Can you guys please review the language pack, provide feedback on the configuration, some of the scores which have been generated and even the BLEU score? I have absolutely everything local and also backed up so I can provide absolutely everything as well as the exact commands I invoked to generate the entire thing from start to finish. Cheers troops. [0] http://home.apache.org/~lewismc/language-pack-ru-en-2016-10-28.tar.gz -- http://home.apache.org/~lewismc/ @hectorMcSpector http://www.linkedin.com/in/lmcgibbney