Re: [Moses-support] processPhraseTableMin Cannot encode numbers largerthan 268435455

2019-05-09 Thread He Shiming
Ok, thank you Marcin.

On Fri, May 10, 2019 at 11:53 AM Marcin Junczys-Dowmunt 
wrote:

> Hi,
>
> Yes, a smaller phrase table should help. I wrote the table, but that was
> in 2012 and I cannot really remember what goes on in there. I think making
> sure that you do not have too many target phrases per source phrase should
> help.
>
>
>
> *From: *He Shiming 
> *Sent: *Thursday, May 9, 2019 8:49 PM
> *To: *moses-support@mit.edu
> *Subject: *[Moses-support] processPhraseTableMin Cannot encode numbers
> largerthan 268435455
>
>
>
> Hi,
>
>
>
> I'm training a Chinese-to-English phrase-based model, using 33 million
> sentence pairs. My phrase table is 90GB gzipped, and the reordering table
> is 27GB gzipped. When running processPhraseTableMin, it dies in step 3
> because of the following error:
>
>
>
> Intermezzo: Calculating Huffman code sets
>
> Creating Huffman codes for 1786817 target phrase symbols
>
> Creating Huffman codes for 871265 scores
>
> Creating Huffman codes for 18018117 scores
>
> Creating Huffman codes for 827039 scores
>
> Creating Huffman codes for 17861459 scores
>
> Creating Huffman codes for 50 alignment points
>
>
>
> Pass 3/3: Compressing target phrases
>
> ..[500]
>
> ..[34500]
>
> terminate called after
> throwing an instance of 'util::Exception'
>
>   what():  moses/TranslationModel/CompactPT/ListCoders.h:179 in static
> void Moses::Simple9::EncodeSymbol(Moses::Simple9::uint&, InIt, InIt) [with
> InIt = unsigned int*; Moses::Simple9::uint = unsigned int] threw
> util::Exception because `*it > 268435455'.
>
> You are trying to encode 436766721 with Simple9. Cannot encode numbers
> larger than 268435455 (2^28-1)
>
> Aborted (core dumped)
>
>
>
> Is my phrase table too big? Pruning seems to have only removed 0.1% of the
> phrases. Is retraining using fewer pairs my only option?
>
>
>
> --
>
> Best regards,
> He Shiming
>
>
>


-- 
Best regards,
He Shiming
___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support


Re: [Moses-support] processPhraseTableMin Cannot encode numbers largerthan 268435455

2019-05-09 Thread Marcin Junczys-Dowmunt
Hi,
Yes, a smaller phrase table should help. I wrote the table, but that was in 
2012 and I cannot really remember what goes on in there. I think making sure 
that you do not have too many target phrases per source phrase should help. 

From: He Shiming
Sent: Thursday, May 9, 2019 8:49 PM
To: moses-support@mit.edu
Subject: [Moses-support] processPhraseTableMin Cannot encode numbers largerthan 
268435455

Hi,

I'm training a Chinese-to-English phrase-based model, using 33 million sentence 
pairs. My phrase table is 90GB gzipped, and the reordering table is 27GB 
gzipped. When running processPhraseTableMin, it dies in step 3 because of the 
following error:

Intermezzo: Calculating Huffman code sets
        Creating Huffman codes for 1786817 target phrase symbols
        Creating Huffman codes for 871265 scores
        Creating Huffman codes for 18018117 scores
        Creating Huffman codes for 827039 scores
        Creating Huffman codes for 17861459 scores
        Creating Huffman codes for 50 alignment points

Pass 3/3: Compressing target phrases
..[500]
..[34500]
terminate called after throwing an 
instance of 'util::Exception'
  what():  moses/TranslationModel/CompactPT/ListCoders.h:179 in static void 
Moses::Simple9::EncodeSymbol(Moses::Simple9::uint&, InIt, InIt) [with InIt = 
unsigned int*; Moses::Simple9::uint = unsigned int] threw util::Exception 
because `*it > 268435455'.
You are trying to encode 436766721 with Simple9. Cannot encode numbers larger 
than 268435455 (2^28-1)
Aborted (core dumped)

Is my phrase table too big? Pruning seems to have only removed 0.1% of the 
phrases. Is retraining using fewer pairs my only option?

-- 
Best regards,
He Shiming

___
Moses-support mailing list
Moses-support@mit.edu
http://mailman.mit.edu/mailman/listinfo/moses-support