Github user mjpost commented on the pull request: https://github.com/apache/incubator-joshua/commit/805b643187e07c1d4dcd5047c3ac2dfa0a84e256#commitcomment-17176031 This is a big bugfix, with the effect that class-based LMs now work. The class map file has entries, e.g., current 0101000 fish 101 Before the fix, the values were treated as integers, which is limiting and wrong (for example, you couldn't have a class-based LM using parts-of-speech). Now the class values are properly part of the vocabulary. The OOV_id should not be 10, I'll fix that later today. This should all be added to StateMinimizingLanguageModel, too. The vocabularies are denser but the contexts tend to be longer, so it's possible they could be used. I'll add that later, too. (Have you used the class-based LM code? It didn't work before, does now; I am seeing half-point BLEU upticks on my test sets).
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---