Hi Ken, Thanks very much for your help.
I have run the same data with an order of 3 and still produce the same error. I will kick off a test using SRILM to see if we get the error. Other than long product names, are there any other patters you think we should look out for? Yes, I am running on a 64-bit system. Thanks, Darragh -- Darragh Whelan | Software Engineer | +353 180 31922 Oracle - WPTG East Point Business Park Clontarf Dublin Ireland Oracle is committed to developing practices and products that help protect the environment -----Original Message----- From: Kenneth Heafield [mailto:[email protected]] Sent: 21 February 2013 15:34 To: [email protected] Subject: Re: [Moses-support] lmplz error: ERROR: 5-gram discount out of range for adjusted count 2: -5.03664 Hi, Your command looks reasonable. There are some differences in what we're calculating. MITLM says "Instead of estimating discount factors f [i] = D(i) from count statistics, we can also tune them to minimize the development set perplexity, which has been observed to improve performance [1]." This is in fact better than what lmplz does. IRSTLM isn't computing canonical modified Kneser-Ney. Your data seems close to passing (it passed for orders 1-4), so my guess would be that they're using a different formula (likely) or do not check for this condition. SRILM should encounter an error on this data. If it doesn't, then I'll dig into the code. If you pipe the data through sort -u, how much smaller does it get and does that fix the issue? Does it contain e.g. long product names that might mess up 5-gram statistics? I guess the reasonable thing to do in this situation would be to turn off the "modified" part of modified Kneser-Ney. Instead of separate discounts for 1, 2, and 3+, it would make one discount for all counts. At least for 5-grams in your case. I could implement this if you want. Also, you're running a 64-bit machine, yes? Kenneth On 02/21/13 07:05, Darragh Whelan wrote: > Hi everyone, > > I am trying to build a language model using lmplz but I am running > into some problems. > > The error is: > > > /adjust_counts.cc:50 in void > lm::builder::<unnamed>::StatCollector::CalculateDiscounts() threw > BadDiscountException because `discounts_[i].amount[j] < 0.0 || > discounts_[i].amount[j] > j'./ > > /ERROR: 5-gram discount out of range for adjusted count 2: -5.03664/ > > /Aborted/ > > // > > I have followed the Moses manual closely and the command I used to run was: > > /bin/lmplz -o 5 -S 80% -T /tmp/ < /data/lm.fr > > /engines/FR_FR/lm/lmplz.arpa > > I don't think it is a problem with our data as I have been able to > build a language model using IRSTLM and MITLM successfully. > > Could anyone please help us with getting lmplz to work with this data? > > Thanks, > > Darragh > > -- > Oracle <http://www.oracle.com/> > Darragh Whelan | Software Engineer | +353 180 31922 > > Oracle- WPTG > > East Point Business Park > > Clontarf > > Dublin > > Ireland > > Green Oracle <http://www.oracle.com/commitment> > > > > Oracle is committed to developing practices and products that help > protect the environment > > > > _______________________________________________ > Moses-support mailing list > [email protected] > http://mailman.mit.edu/mailman/listinfo/moses-support > _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support _______________________________________________ Moses-support mailing list [email protected] http://mailman.mit.edu/mailman/listinfo/moses-support
