Morning, I've been trying to train a language model using the following command:
/opt/model-builder/mosesdecoder/bin/lmplz -o 5 -S 80% -T /tmp < lm_data.en > model.lm But I'm getting the following error: === 1/5 Counting and sorting n-grams === Reading /opt/model-builder/training/lm_data.en ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100 **************************************************************************************************** Unigram tokens 21187448 types 117756 === 2/5 Calculating and sorting adjusted counts === Chain sizes: 1:1413072 2:5151762432 3:9659554816 4:15455287296 5:22538960896 terminate called after throwing an instance of 'lm::builder::BadDiscountException' what(): /opt/model-builder/mosesdecoder/lm/builder/adjust_counts.cc:61 in void lm::builder::{anonymous}::StatCollector::CalculateDiscounts(const lm::builder::DiscountConfig&) threw BadDiscountException because `discounts_[i].amount[j] < 0.0 || discounts_[i].amount[j] > j'. ERROR: 5-gram discount out of range for adjusted count 2: -6.80247 The data I'm training on has come from the OPUS project. I found some references online to issues when there isn't enough training data, but I think I have sufficient data and have previously trained on a lot less (and even on a subset of my current data): $ wc lm_data.en 1874495 21187448 96148754 lm_data.en Any ideas what might be causing the problem? James
_______________________________________________ Moses-support mailing list Moses-support@mit.edu http://mailman.mit.edu/mailman/listinfo/moses-support