Ok, thank you Marcin.
On Fri, May 10, 2019 at 11:53 AM Marcin Junczys-Dowmunt
wrote:
> Hi,
>
> Yes, a smaller phrase table should help. I wrote the table, but that was
> in 2012 and I cannot really remember what goes on in there. I think making
> sure that you do not have too many target phrases per source phrase should
> help.
>
>
>
> *From: *He Shiming
> *Sent: *Thursday, May 9, 2019 8:49 PM
> *To: *moses-support@mit.edu
> *Subject: *[Moses-support] processPhraseTableMin Cannot encode numbers
> largerthan 268435455
>
>
>
> Hi,
>
>
>
> I'm training a Chinese-to-English phrase-based model, using 33 million
> sentence pairs. My phrase table is 90GB gzipped, and the reordering table
> is 27GB gzipped. When running processPhraseTableMin, it dies in step 3
> because of the following error:
>
>
>
> Intermezzo: Calculating Huffman code sets
>
> Creating Huffman codes for 1786817 target phrase symbols
>
> Creating Huffman codes for 871265 scores
>
> Creating Huffman codes for 18018117 scores
>
> Creating Huffman codes for 827039 scores
>
> Creating Huffman codes for 17861459 scores
>
> Creating Huffman codes for 50 alignment points
>
>
>
> Pass 3/3: Compressing target phrases
>
> ..[500]
>
> ..[34500]
>
> terminate called after
> throwing an instance of 'util::Exception'
>
> what(): moses/TranslationModel/CompactPT/ListCoders.h:179 in static
> void Moses::Simple9::EncodeSymbol(Moses::Simple9::uint&, InIt, InIt) [with
> InIt = unsigned int*; Moses::Simple9::uint = unsigned int] threw
> util::Exception because `*it > 268435455'.
>
> You are trying to encode 436766721 with Simple9. Cannot encode numbers
> larger than 268435455 (2^28-1)
>
> Aborted (core dumped)
>
>
>
> Is my phrase table too big? Pruning seems to have only removed 0.1% of the
> phrases. Is retraining using fewer pairs my only option?
>
>
>
> --
>
> Best regards,
> He Shiming
>
>
>
--
Best regards,
He Shiming
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support