Thanks, Kenneth. Here's what I get now.

$ ~/mosesdecoder.multisource.git/bin/lmplz -o 2 <<< "that is what happens ?
> cssd has nothing more or voldemort or pastries in prague ."
> === 1/5 Counting and sorting n-grams ===
> Reading /tmp/sh-thd-1452698150 (deleted)
>
> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
> tcmalloc: large alloc 29442056192 bytes == 0x1c74000 @
> tcmalloc: large alloc 78512136192 bytes == 0x6de346000 @
>
> ****************************************************************************************************
> Unigram tokens 16 types 18
> === 2/5 Calculating and sorting adjusted counts ===
> Chain sizes: 1:216 2:107979354931
> tcmalloc: large alloc 107979358208 bytes == 0x192a648000 @
> terminate called after throwing an instance of
> 'lm::builder::BadDiscountException'
>   what():
> /home/lanes/mosesdecoder.multisource.git/lm/builder/adjust_counts.cc:53 in
> void lm::builder::{anonymous}::StatCollector::CalculateDiscounts(const
> lm::builder::DiscountConfig&) threw BadDiscountException because `s.n[j] ==
> 0'.
> Could not calculate Kneser-Ney discounts for 1-grams with adjusted count 4
> because we didn't observe any 1-grams with adjusted count 3; Is this small
> or artificial data?
> Try deduplicating the input.  To override this error for e.g. a
> class-based model, rerun with --discount_fallback
> Aborted (core dumped)



On Tue, Jan 12, 2016 at 5:40 PM, Kenneth Heafield <mo...@kheafield.com>
wrote:

> Pushed the fix from kenlm master in October to Moses master.
>
> On 01/12/2016 10:34 PM, Lane Schwartz wrote:
> > Steps to reproduce this error:
> >
> >     $ ~/mosesdecoder.git/bin/lmplz -o 2 <<< "that is what happens ? cssd
> >     has nothing more or voldemort or pastries in prague ."
> >     === 1/5 Counting and sorting n-grams ===
> >     Reading /tmp/sh-thd-107574999377 (deleted)
> >
>  
> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
> >     tcmalloc: large alloc 29442056192 bytes == 0x2ae2000 @
> >     tcmalloc: large alloc 78512136192 bytes == 0x6df1b4000 @
> >
>  
> ****************************************************************************************************
> >     Unigram tokens 16 types 18
> >     === 2/5 Calculating and sorting adjusted counts ===
> >     Chain sizes: 1:216 2:107979354931
> >     tcmalloc: large alloc 107979358208 bytes == 0x192b4b6000 @
> >     lmplz: ./util/fixed_array.hh:104: T&
> >     util::FixedArray<T>::operator[](std::size_t) [with T =
> >     lm::NGramStream<lm::builder::BuildingPayload>; std::size_t = long
> >     unsigned int]: Assertion `i < size()' failed.
> >
> >
> >
> >
> > On Wed, Sep 30, 2015 at 11:41 AM, Kenneth Heafield <mo...@kheafield.com
> > <mailto:mo...@kheafield.com>> wrote:
> >
> >     That's bad.  Would you mind sending me privately a minimal example of
> >     the data that reproduces the problem?
> >
> >     Kenneth
> >
> >     On 09/30/2015 04:29 PM, Alex Martinez wrote:
> >     > Hello,
> >     > today I've pulled moses code and recompiled and some experiments
> (EMS)
> >     > that were already working are failing on the LM training step with
> the
> >     > following error:
> >     >
> >     > Executing: /opt/moses/bin/lmplz --text
> >     > /home/alexmc/devel/toydata/process/lm/nc=pos.factored.1 --order 5
> >     --arpa
> >     > /home/alexmc/devel/toydata/process/lm/nc=pos.lm.1
> --discount_fallback
> >     > === 1/5 Counting and sorting n-grams ===
> >     > Reading /mnt/a62/devel/toydata/process/lm/nc=pos.factored.1
> >     >
> >
>  
> ----5---10---15---20---25---30---35---40---45---50---55---60---65---70---75---80---85---90---95--100
> >     > tcmalloc: large alloc 4753956864 bytes == 0x1f7c000 @
> >     > tcmalloc: large alloc 22185107456 bytes == 0x11d536000 @
> >     >
> >
>  
> ****************************************************************************************************
> >     > Unigram tokens 2433135 types 47
> >     > === 2/5 Calculating and sorting adjusted counts ===
> >     > Chain sizes: 1:564 2:2630656000 3:4932480000 4:7891967488
> >     5:11509120000
> >     > tcmalloc: large alloc 11509121024 bytes == 0x1f7c000 @
> >     > tcmalloc: large alloc 2630656000 bytes == 0x2aff70000 @
> >     > tcmalloc: large alloc 4932485120 bytes == 0x34cc3a000 @
> >     > tcmalloc: large alloc 7891968000 bytes == 0x64933c000 @
> >     > lmplz: ./util/fixed_array.hh:104: T&
> >     > util::FixedArray<T>::operator[](std::size_t) [with T =
> >     > lm::NGramStream<lm::builder::BuildingPayload>; std::size_t = long
> >     > unsigned int]: Assertion `i < size()' failed.
> >     >
> >     > I'm runing a Linux server with Ubuntu 15.04
> >     >
> >     > Any help will be appreciated
> >     >
> >     > Alex Martínez
> >     >
> >     >
> >     > _______________________________________________
> >     > Moses-support mailing list
> >     > Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> >     > http://mailman.mit.edu/mailman/listinfo/moses-support
> >     >
> >     _______________________________________________
> >     Moses-support mailing list
> >     Moses-support@mit.edu <mailto:Moses-support@mit.edu>
> >     http://mailman.mit.edu/mailman/listinfo/moses-support
> >
> >
> >
> >
> > --
> > When a place gets crowded enough to require ID's, social collapse is not
> > far away.  It is time to go elsewhere.  The best thing about space travel
> > is that it made it possible to go elsewhere.
> >                 -- R.A. Heinlein, "Time Enough For Love"
>



-- 
When a place gets crowded enough to require ID's, social collapse is not
far away.  It is time to go elsewhere.  The best thing about space travel
is that it made it possible to go elsewhere.
                -- R.A. Heinlein, "Time Enough For Love"
_______________________________________________
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support

Reply via email to