Hi Folks,
I managed to generate my first language pack today based on heiro model.
It's 4.8GB in size so I have made it available via my home.apache.org
public space at [0]. Right now it is uploading and will take a wee while.
I would like some community review so we can review the quality of what has
been generated. In addition there are a number of immediate things I am
struggling with.

Firstly, the following files were not present after running the bundler.py.

   -  prepare.sh, this is a baseline requirement for running the tests as
   detailed within the auto-generated README.
   - the entire 'scripts' directory!!! This means that no utility
   processing can be undertaken at all.

I know that both of the above are essential requirements, I therefore added
them from a different language pack, increased default maximum memory usage
and also augmented the README with some details regarding the dataset used
to generate the language pack.

In comparison to the es --> en language pack posted by Matt, due to the fat
that no scripts directory was generated, this language pack does not have
the scripts/release directory either. I am not sure how this was generated.

Over and above what I've detailed so far, there is one blocking issue for
me... when I submit Russian text to the Joshua server, it just spits back
out the same Russian text! I can see the decoder logging to std out however
I can only assume that no decoding is actually taking place.

Can you guys please review the language pack, provide feedback on the
configuration, some of the scores which have been generated and even the
BLEU score? I have absolutely everything local and also backed up so I can
provide absolutely everything as well as the exact commands I invoked to
generate the entire thing from start to finish.
Cheers troops.

[0] http://home.apache.org/~lewismc/language-pack-ru-en-2016-10-28.tar.gz

-- 
http://home.apache.org/~lewismc/
@hectorMcSpector
http://www.linkedin.com/in/lmcgibbney

Reply via email to