Github user mjpost commented on the pull request:

    
https://github.com/apache/incubator-joshua/commit/805b643187e07c1d4dcd5047c3ac2dfa0a84e256#commitcomment-17176031
  
    This is a big bugfix, with the effect that class-based LMs now work. The 
class map file has entries, e.g.,
    
        current 0101000
        fish 101
    
    Before the fix, the values were treated as integers, which is limiting and 
wrong (for example, you couldn't have a class-based LM using parts-of-speech). 
Now the class values are properly part of the vocabulary. 
    
    The OOV_id should not be 10, I'll fix that later today.
    
    This should all be added to StateMinimizingLanguageModel, too. The 
vocabularies are denser but the contexts tend to be longer, so it's possible 
they could be used. I'll add that later, too.
    
    (Have you used the class-based LM code? It didn't work before, does now; I 
am seeing half-point BLEU upticks on my test sets).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to