
I've been trying to train a language model using the following command:

    /opt/model-builder/mosesdecoder/bin/lmplz -o 5 -S 80% -T /tmp <
lm_data.en > model.lm

But I'm getting the following error:

    === 1/5 Counting and sorting n-grams ===
    Reading /opt/model-builder/training/lm_data.en


    Unigram tokens 21187448 types 117756
    === 2/5 Calculating and sorting adjusted counts ===
    Chain sizes: 1:1413072 2:5151762432 3:9659554816 4:15455287296
    terminate called after throwing an instance of
    what(): /opt/model-builder/mosesdecoder/lm/builder/
in void lm::builder::{anonymous}::StatCollector::CalculateDiscounts(const
lm::builder::DiscountConfig&) threw BadDiscountException because
`discounts_[i].amount[j] < 0.0 || discounts_[i].amount[j] > j'.
    ERROR: 5-gram discount out of range for adjusted count 2: -6.80247

The data I'm training on has come from the OPUS project. I found some
references online to issues when there isn't enough training data, but I
think I have sufficient data and have previously trained on a lot less (and
even on a subset of my current data):

    $ wc lm_data.en
    1874495 21187448 96148754 lm_data.en

Any ideas what might be causing the problem?

